Getting "Internal API Unreachable" during gitlab:check


Running Gitlab communty v16.3.1, in the past two weeks I’ve encountered the system not running and checking the status of things I would get this:

[~]$ sudo gitlab-rake gitlab:check
Checking GitLab subtasks …

Checking GitLab Shell …

GitLab Shell: … GitLab Shell version >= 14.26.0 ? … OK (14.26.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: FAILED - Internal API unreachable
gitlab-shell self-check failed
Try fixing it:
Make sure GitLab is running;
Check the gitlab-shell configuration file:
sudo -u git -H editor /opt/gitlab/embedded/service/gitlab-shell/config.yml
Please fix the error above and rerun the checks.

Checking GitLab Shell … Finished

Stopping gitlab showed that the gitlab-workhorse process wasn’t happy about someting
[~]$ sudo gitlab-ctl stop
ok: down: alertmanager: 0s, normally up
ok: down: gitaly: 1s, normally up
ok: down: gitlab-exporter: 0s, normally up
ok: down: gitlab-kas: 0s, normally up
timeout: run: gitlab-workhorse: (pid 5969) 608771s, want down, got TERM
ok: down: logrotate: 0s, normally up
ok: down: nginx: 1s, normally up
ok: down: node-exporter: 0s, normally up
ok: down: postgres-exporter: 1s, normally up
ok: down: postgresql: 0s, normally up
ok: down: prometheus: 1s, normally up
ok: down: puma: 0s, normally up
ok: down: redis: 0s, normally up
ok: down: redis-exporter: 1s, normally up
ok: down: sidekiq: 0s, normally up

I kill the that process ID, then restart gitlab but still get the “Internal API Unreachable” status.

Looking at the config.yml file as suggested looks like the default configuration, nothing looks weird to me.

I need to restart the server to get it going again.

The next time this occurs, what should I look at to try to get the server back up? What should I look at that could be causing the internal API to die?

Thanks for any suggestions,

An update if anyone else is experiencing this.

It turned out the server’s file system had started to die and was generating so many errors it ultimately got to the point where the OS shut down writes to the few drive partitions, killing the server. A new server was created and the old one was fixed up enough to allow us to migrate the data over to the new one.