We are seeing a periodic intermittent issue with Gitlab CE omnibus 8.12.3, which has the following symptoms:
All five of our runners seem to “lose contact” with gitlab simultaneously. If you go to https://gitlab.yourcompany.biz/admin/runners you can see that the last contact is “28 minutes” ago and climbing.
I have a gitlab runner running interactively so I can see its console (it’s on my own desktop PC so I can study this), and there’s NO output to indicate anything is wrong.
I can see the /var/log/gitlab/gitlab-rails/production.log and it contains lines like this which seem to indicate that the register.json is being repeatedly hit by a number of correct looking IP addresses:
Started POST “/ci/api/v1/builds/register.json” for 192.168.215.221 at 2016-10-13 09:40:05 -0500
Started POST “/ci/api/v1/builds/register.json” for 192.168.215.35 at 2016-10-13 09:40:06 -0500
… and more similar
And yet, no jobs are being run, and the time value “since last communication” keeps going up.
This condition PERSISTED even after I did “sudo gitlab-ctl stop” and then restarted “sudo gitlab-ctl start”. Restarting runners also has no effect.
This condition only went away when I rebooted the Ubuntu 14.04 vm. It seems like Gitlab is up, you can push and pull git repos, and use the whole Gitlab web user interface, but CI alone is affected.
I’m thinking it may be a problem in the OS itself, and I plan to update this VM to Ubuntu 16.x LTS to see if things get more stable.