Problem to solve
Frequently throughout the day, our GitLab Omnibus will “restart” (we have watch
ed sudo gitlab-ctl status
during the issue and nothing is down). We have dug extensively through the logs and found no direct correlation to the problem. We have had suspicions that it was related to reviewing issues or merge requests, but we have also had no one using the application then gone to view a page and get hit with a 502 error.
Steps to reproduce
We do not know how to reproduce, but have dug extensively through the logs. This issue started happening towards the end of last year but slowly seems to get worse. Our debugging process has gone like this:
sudo gitlab-ctl tail > gitlab.log
# Reproduce the 502 crash error (usually by spamming interactions on the clients: comments, reactions, issues, approvals, etc.)
sudo journalctl --since="DATE TIME" --until="DATE TIME" > journal.log
less gitlab.log # Grep for (502|error|status)
less journal.log
We looked for the first 502 error in the file, then looked for preceding errors. We did this multiple times, most of which had no errors until about 5-10 seconds before the crash, some up to 30 seconds. The only errors we found (some which only had a severity of WARN
were: IpynbDiff::InvalidNotebookError
, MergeRequest.mergeError
, BlobViewer.renderError
. None of these were consistent across the crashes. We also tried disabling certain features such as pages and SMTP without any success.
The message preceding the 502 errors after the crash is:
badgateway: failed to receive response: dial unix /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket: connect: connection refused
The CPU/RAM do not spike when the crash occurs. We watched btop
while a crash occurred and the CPU graph is at ~0% when it occurs, then about 20-30% during the “startup”. We have tried doubling CPU cores and RAM before without success.
Please suggest more places we should be looking for error logs as so far this problem seems to be silent. Simple things like sudo gitlab-ctl status
show no issues.
Configuration
- GitLab Omnibus v16.9.2-ee
- Red Hat Enterprise Linux 8.5 (
Linux localhost.localdomain 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Tue Apr 12 11:20:32 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
- This system is a VM running on an “identical” system.
- This VM has 4 cores and 8 GB of RAM with the CPU never peaking and average 4 GB RAM usage.
- GitLab is the only thing installed and running on the system
- We have about 4 “active” users on the server.
gitlab.rb
external_url 'https://gitlab.DOMAIN.com'
gitlab_rails['smtp_enable'] = false
gitlab_rails['smtp_address'] = "smtp.office365.com"
gitlab_rails['smtp_port'] = 587
gitlab_rails['smtp_user_name'] = "noreply@DOMAIN.com"
gitlab_rails['smtp_password'] = "PASSWORD"
gitlab_rails['smtp_domain'] = "DOMAIN.com"
gitlab_rails['smtp_authentication'] = "login"
gitlab_rails['smtp_enable_starttls_auto'] = true
gitlab_rails['smtp_openssl_verify_mode'] = 'peer'
gitlab_rails['gitlab_email_from'] = 'noreply@DOMAIN.com'
gitlab_rails['gitlab_email_display_name'] = 'GitLab'
gitlab_rails['gitlab_email_subject_suffix'] = 'GitLab'
puma['worker_processes'] = 0 # 2
sidekiq['concurrency'] = 10
nginx['ssl_certificate'] = "/etc/gitlab/ssl/gitlab.crt"
nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/gitlab.key"
pages_external_url "https://pages.DOMAIN.com"
gitlab_pages['enable'] = true
gitlab_pages['listen_proxy'] = "127.0.0.1:8090"
gitlab_pages['access_control'] = false
gitlab_pages['namespace_in_path'] = true
pages_nginx['enable'] = true
pages_nginx['redirect_http_to_https'] = true
pages_nginx['ssl_certificate'] = "/etc/gitlab/ssl/pages.DOMAIN.com.crt"
pages_nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/pages.DOMAIN.com.key"
prometheus['enable'] = false
prometheus_monitoring['enable'] = false
gitaly['configuration'][:hooks][:custom_hooks_dir] = "/var/opt/gitlab/gitaly/custom_hooks"
sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:9229 0.0.0.0:* LISTEN 1665/gitlab-workhor
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 537963/puma 6.4.0 (
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 349202/nginx: maste
tcp 0 0 127.0.0.1:8082 0.0.0.0:* LISTEN 361886/sidekiq_expo
tcp 0 0 127.0.0.1:9236 0.0.0.0:* LISTEN 1693/gitaly
tcp 0 0 127.0.0.1:8150 0.0.0.0:* LISTEN 1644/gitlab-kas
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1230/sshd
tcp 0 0 127.0.0.1:8151 0.0.0.0:* LISTEN 1644/gitlab-kas
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 340886/master
tcp 0 0 127.0.0.1:8153 0.0.0.0:* LISTEN 1644/gitlab-kas
tcp 0 0 127.0.0.1:8090 0.0.0.0:* LISTEN 349092/gitlab-pages
tcp 0 0 127.0.0.1:8154 0.0.0.0:* LISTEN 1644/gitlab-kas
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 349202/nginx: maste
tcp 0 0 127.0.0.1:8155 0.0.0.0:* LISTEN 1644/gitlab-kas
tcp 0 0 127.0.0.1:8092 0.0.0.0:* LISTEN 361884/sidekiq 7.1.
tcp 0 0 0.0.0.0:8060 0.0.0.0:* LISTEN 349202/nginx: maste
tcp6 0 0 :::22 :::* LISTEN 1230/sshd
tcp6 0 0 ::1:25 :::* LISTEN 340886/master
udp 0 0 127.0.0.1:323 0.0.0.0:* 1172/chronyd
udp6 0 0 ::1:323 :::* 1172/chronyd
Versions
- Self-managed
- GitLab.com SaaS
Versions
sudo gitlab-rake gitlab:env:info
System information
System:
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 3.1.4p223
Gem Version: 3.5.5
Bundler Version:2.5.5
Rake Version: 13.0.6
Redis Version: 7.0.15
Sidekiq Version:7.1.6
Go Version: unknown
GitLab information
Version: 16.9.2-ee
Revision: 0d71d32d321
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 14.10
URL: https://gitlab.DOMAIN.com
HTTP Clone URL: https://gitlab.DOMAIN.com/some-group/some-project.git
SSH Clone URL: git@gitlab.DOMAIN.com:some-group/some-project.git
Elasticsearch: no
Geo: no
Using LDAP: no
Using Omniauth: yes
Omniauth Providers:
GitLab Shell
Version: 14.33.0
Repository storages:
- default: unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Gitaly
- default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version: 16.9.2
- default Git Version: 2.43.0
sudo gitlab-runner --version
Version: 16.8.0
Git revision: c72a09b6
Git branch: 16-8-stable
GO version: go1.21.5
Built: 2024-01-18T22:42:25+0000
OS/Arch: linux/amd64
Similar Issues
Please let me know what else I can provide to help solve this issue. We have been pouring over logs the past weeks and can’t find anything. I am required to screen anything I send so that is why I didn’t upload the logs directly.