Using gitlab omnibus 14.10.0, the alertmanager service comes up but goes down after a few seconds:
# gitlab-ctl start alertmanager
ok: run: alertmanager: (pid 18821) 0s
# gitlab-ctl status alertmanager
down: alertmanager: 1s, normally up, want up; run: log: (pid 1060) 1739s
The process (PID 1060) is still running:
# ps auxw | grep alert
root 1030 0.0 0.0 4404 1264 ? Ss 10:57 0:01 runsv alertmanager
root 1060 0.0 0.0 4548 812 ? S 10:57 0:00 svlogd -tt /var/log/gitlab/alertmanager
and an endless stream of repeated errors in the current
log file:
2022-05-06_10:30:30.07486 level=info ts=2022-05-06T10:30:30.074Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=master, revision=)"
2022-05-06_10:30:30.07490 level=info ts=2022-05-06T10:30:30.074Z caller=main.go:226 build_context="(go=go1.17.6, user=GitLab-Omnibus, date=)"
2022-05-06_10:30:30.07559 level=warn ts=2022-05-06T10:30:30.075Z caller=cluster.go:177 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
2022-05-06_10:30:30.07647 level=error ts=2022-05-06T10:30:30.076Z caller=main.go:250 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"
[repeated every second]
The machine only has a public IP address and I’m not sure where or if I need to provide an explicit IP anyway. Is there an alertmanager config file I’ve not set up? Should I tell it 127.0.0.1?
I see alertmanager is related to prometheus, and this seems to be similar errors related to running in a docker container:
https://groups.google.com/g/prometheus-users/c/nApams07R0c/m/vSnYOobrBwAJ?pli=1
(My gitlab is running natively on an Ubuntu server machine)
I’m thinking this is harmless now I know this is prometheus-related, and not some sort of core gitlab alert manager, but I’d still like to fix it. Any ideas?
Thanks
Can you try doing this:
gitlab-ctl stop
systemctl restart gitlab-runsvdir
gitlab-ctl status
you may also wish to do:
gitlab-ctl reconfigure
just in case, and then redo the commands above. Also check /etc/gitlab/gitlab.rb
for something like this:
alertmanager['flags'] = {
'cluster.advertise-address' => "127.0.0.1:9093",
'web.listen-address' => "#{node['monitoring']['alertmanager']['listen_address']}",
'storage.path' => "#{node['monitoring']['alertmanager']['home']}/data",
'config.file' => "#{node['monitoring']['alertmanager']['home']}/alertmanager.yml"
}
it should be on localhost - maybe you have something different? If so, change accordingly and reconfigure and restart.
EDIT:
On my test install, my alertmanager config looks like this:
# alertmanager['enable'] = true
# alertmanager['home'] = '/var/opt/gitlab/alertmanager'
# alertmanager['log_directory'] = '/var/log/gitlab/alertmanager'
# alertmanager['admin_email'] = 'admin@example.com'
# alertmanager['flags'] = {
# 'web.listen-address' => "localhost:9093",
# 'storage.path' => "/var/opt/gitlab/alertmanager/data",
# 'config.file' => "/var/opt/gitlab/alertmanager/alertmanager.yml"
# }
# alertmanager['env_directory'] = '/opt/gitlab/etc/alertmanager/env'
# alertmanager['env'] = {
# 'SSL_CERT_DIR' => "/opt/gitlab/embedded/ssl/certs/"
# }
so completely commented out, however you will see here that it also corresponds to localhost - so either that or 127.0.0.1 should do the trick if configured any other way. I also run on a VPS with only a public IP, but then I don’t have alertmanager on anything other than localhost. So you can either change it or comment the entire section out so that it reverts to localhost.
I had no configuration for alertmanager in gitlab.rb, but adding the alertmanager['flags']
section with 127.0.0.1:9093
seems to have fixed it, restarted it and it seems to be happily gossipping away:
caller=main.go:518 msg=Listening address=localhost:9093
caller=tls_config.go:191 msg="TLS is disabled." http2=false
caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000175201s
caller=cluster.go:688 component=cluster msg="gossip settled; proceeding" elapsed=10.003230982s
Odd that its taken 127.0.0.1 in the “cluster.advertise-address” setting and printed “localhost”, and that it didn’t default to any of this with no config file. I could experiment more but I’m happy now, thanks!
1 Like
/etc/hosts probably resolves 127.0.0.1 to localhost. But at least it’s working 