Change in 11.x CE's `gitlab-ctl reconfigure` Behavior?

We’ve been running GitLab CE on AWS since June of 2017 (so, probably version 9.2.5?). To help with resiliency, we’ve run the DB components on (PGSQL flavored) RDS and placed the repositories onto an EFS share. We designed our deployment-automation to enable us to “upgrade” the GitLab installation simply by executing a CloudFormation update. Basically:

  1. a new EC2 instance is spawned
  2. CFn-init scripts installs GitLab CE via the omnibus RPM
  3. CFn-init scripts copy a templated gitlab.rb file from S3 to /etc/gitlab
  4. CFn-init scripts finish out by calling gitlab-ctl reconfigure

Preformatted textUp through version 10.8.7, this worked like a charm. However, with the 11.x version, the final step (running gitlab-ctl reconfigure) hangs. If, configure CFn to not roll-back on failure, I can login after CFn has signaled a failure (due to having exceeded the configured launch timeout) to check on its status. When I go to look at the /var/log/gitlab/reconfigure/NNNNNNNNNN.log, I consistently see output similar to:

[2018-11-15T17:28:19+00:00] INFO: execute[systemctl enable gitlab-runsvdir] ran successfully
[2018-11-15T17:28:19+00:00] INFO: cookbook_file[/usr/lib/systemd/system/gitlab-runsvdir.service] sending run action to execute[systemctl start gitlab-runsvdir] (immediate)

When I check on the systemd service, I typically find:

# systemctl status -l gitlab-runsvdir
● gitlab-runsvdir.service - GitLab Runit supervision process
   Loaded: loaded (/usr/lib/systemd/system/gitlab-runsvdir.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Attempting to manually start the service simply hangs.

If I reboot, none of the gitlab-ctl utilities function. Can’t simply uninstall and reinstall: the yum job hangs when it calls gitlab-ctl. The failed replacement EC2 is just totally wedged.

Interestingly, if I launch an EC2 using the same CFn template, but prevent step 4 from happening, if I then login and execute the step 4 script from an interactive shell, everything completes completely. No hang. No errors. Resulting service comes up as expected (with all RDS-/EFS-hosted persisted data served out).

I’m assuming something’s changed in 11.x and its expectations around its invoking shell, but I haven’t been able to isolate what that might be, yet. Any help would be much appreciated.

Just for shits-n-grins, ultimately tried executing gitlab-ctl reconfigure as a one-shot systemd service: hangs in the same place. So, very definitely something in 11.x that doesn’t like when invoked without some type of interactive shell as a parent process. :frowning:

I’m seeing the same problem. I’ll post again with logs. To confirm, I’m running gitlab-ctl reconfigure as part of an EC2 userdata script after downloading the gitlab.rb file.

If I log into the box, ps -aux reveals that reconfigure is still running. I can kill it, and run it again manually with no problem.