HTTP 502 "taking too much time to respond" instantly, but occasionally

Hello,

I currently have 3 self-managed Gitlab instances on 3 different servers, though get a common issue whereby they would all throw a HTTP 502 error. This would happen periodically on a nearly daily basis. That brings in the “GitLab is taking too much time to respond” message, however this response is being returned instantly, therefore it’s not a timeout problem but a different service intercommunication issue or a down the line service returning an error instantly.

The 3 instances do have factors in common:

  • Running on either Debian 11 or 12.
  • Omnibus package for Debian. (https://packages.gitlab.com/gitlab/gitlab-ee/debian/ apt source.)
  • NGINX not in use as the servers already have Apache HTTPD (for unrelated reasons), which performs the reverse proxy to the Gitlab workhorse. (ProxyPassReverse http://127.0.0.1:8181)

Version details:

  • GitLab v16.7.3-ee
  • GitLab Shell 14.32.0
  • GitLab Workhorse v16.7.3
  • GitLab API v4
  • GitLab KAS v16.8.0-rc1
  • Ruby 3.1.4p223
  • Rails 7.0.8
  • PostgreSQL (main) 13.12
  • PostgreSQL (ci) 13.12
  • Redis 7.0.1

Often running gitlab-ctl reconfigure or gitlab-ctl restart resolves it after waiting about 15-20 minutes, though that requires direct attendance by particular administrators to run. To avoid disruption there is a twice-daily gitlab-ctl restart in place during quieter times however that’s not ideal for 2 of the instances of which ideally require 24x7 availability due to timezone distributed users.

As I’m relatively speaking somewhat new to Gitlab specific administration where should I be looking to debug this issue?

Thanks,

Adam

I think you need to track down the error in the logs, it is a bit difficult because there are a lot of logs and they are verbose. Do you have a colleration id in the error message displayed on the web page ? If yes you can search based on it.

1 Like

Thanks for your response.

I don’t recall the 502 page showing an error ID, though it may be a very faint grey that I had missed. I’ll look more closely for it next time.

I remembered last night that in addition to a 502 error, at the same time when using GIT from a repository pulling and pushing fails due to the internal API being unavailable. (Might be useful to determine which component is at fault.)

You can do something like jq 'select(.status == 502)' production_json.log to find relevant logs. You’ll have to dig into those resulting entries because 502 is a very big catch all kind of code for server issues.

2 Likes

Thanks for the suggestion. I just had a HTTP 502 error now. It definitely didn’t show any error reference.
I found that " `production_json.log" file at “/var/log/gitlab/gitlab-rails/production_json.log”, though no HTTP 502 errors were present. (I checked this and the regular “production.log” file.)

Is there anywhere else I can look log wise for this issue? “/var/log/gitlab/gitlab-rails/production_json.log” shows nothing regarding HTTP 502.
(This still happens nearly daily.)