Alertmanager status is "down", error stream in log

Using gitlab omnibus 14.10.0, the alertmanager service comes up but goes down after a few seconds:

# gitlab-ctl start alertmanager
ok: run: alertmanager: (pid 18821) 0s
# gitlab-ctl status alertmanager
down: alertmanager: 1s, normally up, want up; run: log: (pid 1060) 1739s

The process (PID 1060) is still running:

# ps auxw | grep alert
root      1030  0.0  0.0   4404  1264 ?        Ss   10:57   0:01 runsv alertmanager
root      1060  0.0  0.0   4548   812 ?        S    10:57   0:00 svlogd -tt /var/log/gitlab/alertmanager

and an endless stream of repeated errors in the current log file:

2022-05-06_10:30:30.07486 level=info ts=2022-05-06T10:30:30.074Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=master, revision=)"
2022-05-06_10:30:30.07490 level=info ts=2022-05-06T10:30:30.074Z caller=main.go:226 build_context="(go=go1.17.6, user=GitLab-Omnibus, date=)"
2022-05-06_10:30:30.07559 level=warn ts=2022-05-06T10:30:30.075Z caller=cluster.go:177 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
2022-05-06_10:30:30.07647 level=error ts=2022-05-06T10:30:30.076Z caller=main.go:250 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"
[repeated every second]

The machine only has a public IP address and I’m not sure where or if I need to provide an explicit IP anyway. Is there an alertmanager config file I’ve not set up? Should I tell it 127.0.0.1?

I see alertmanager is related to prometheus, and this seems to be similar errors related to running in a docker container:

https://groups.google.com/g/prometheus-users/c/nApams07R0c/m/vSnYOobrBwAJ?pli=1

(My gitlab is running natively on an Ubuntu server machine)

I’m thinking this is harmless now I know this is prometheus-related, and not some sort of core gitlab alert manager, but I’d still like to fix it. Any ideas?

Thanks

Can you try doing this:

gitlab-ctl stop
systemctl restart gitlab-runsvdir
gitlab-ctl status

you may also wish to do:

gitlab-ctl reconfigure

just in case, and then redo the commands above. Also check /etc/gitlab/gitlab.rb for something like this:

alertmanager['flags'] = {
  'cluster.advertise-address' => "127.0.0.1:9093",
  'web.listen-address' => "#{node['monitoring']['alertmanager']['listen_address']}",
  'storage.path' => "#{node['monitoring']['alertmanager']['home']}/data",
  'config.file' => "#{node['monitoring']['alertmanager']['home']}/alertmanager.yml"
}

it should be on localhost - maybe you have something different? If so, change accordingly and reconfigure and restart.

EDIT:

On my test install, my alertmanager config looks like this:

# alertmanager['enable'] = true
# alertmanager['home'] = '/var/opt/gitlab/alertmanager'
# alertmanager['log_directory'] = '/var/log/gitlab/alertmanager'
# alertmanager['admin_email'] = 'admin@example.com'
# alertmanager['flags'] = {
#   'web.listen-address' => "localhost:9093",
#   'storage.path' => "/var/opt/gitlab/alertmanager/data",
#   'config.file' => "/var/opt/gitlab/alertmanager/alertmanager.yml"
# }
# alertmanager['env_directory'] = '/opt/gitlab/etc/alertmanager/env'
# alertmanager['env'] = {
#   'SSL_CERT_DIR' => "/opt/gitlab/embedded/ssl/certs/"
# }

so completely commented out, however you will see here that it also corresponds to localhost - so either that or 127.0.0.1 should do the trick if configured any other way. I also run on a VPS with only a public IP, but then I don’t have alertmanager on anything other than localhost. So you can either change it or comment the entire section out so that it reverts to localhost.

I had no configuration for alertmanager in gitlab.rb, but adding the alertmanager['flags'] section with 127.0.0.1:9093 seems to have fixed it, restarted it and it seems to be happily gossipping away:

caller=main.go:518 msg=Listening address=localhost:9093
caller=tls_config.go:191 msg="TLS is disabled." http2=false
caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000175201s
caller=cluster.go:688 component=cluster msg="gossip settled; proceeding" elapsed=10.003230982s

Odd that its taken 127.0.0.1 in the “cluster.advertise-address” setting and printed “localhost”, and that it didn’t default to any of this with no config file. I could experiment more but I’m happy now, thanks!

1 Like

/etc/hosts probably resolves 127.0.0.1 to localhost. But at least it’s working :slight_smile: