Gitlab-CE Upgrade from 11.7.12 fails, "stack level too deep" and Gitaly problem

Hi,
we have a Gitlab Omnibus install with Docker. We are currently on 11.7.12, and upgrade to 11.8.* fails. Interestingly, the update works in our staging setup - the biggest difference between the two setups is that in staging, all Gitlab data is stored locally on the VM while in production, all Gitlab data is stored on a mounted NAS volume. However, I can’t connect this fact to the error messages.

What happens:

The Gitlab container fails to start after re-configuration. In the container’s logs,

there's a loop
* template[/opt/gitlab/sv/gitaly/check] action create (skipped due to only_if)
* template[/opt/gitlab/sv/gitaly/finish] action create (skipped due to only_if)
* directory[/opt/gitlab/sv/gitaly/control] action create (up to date)
* link[/opt/gitlab/init/gitaly] action create (up to date)
* file[/opt/gitlab/sv/gitaly/down] action delete (up to date)
* ruby_block[reload_log_service] action create
* ruby_block[restart_service] action nothing (skipped due to action :nothing)
* ruby_block[restart_log_service] action nothing (skipped due to action :nothing)
* ruby_block[reload_log_service] action nothing (skipped due to action :nothing)
* directory[/opt/gitlab/sv/gitaly] action create (up to date)
* template[/opt/gitlab/sv/gitaly/run] action create (up to date)
* directory[/opt/gitlab/sv/gitaly/log] action create (up to date)
* directory[/opt/gitlab/sv/gitaly/log/main] action create (up to date)
* template[/opt/gitlab/sv/gitaly/log/run] action create (up to date)
* template[/var/log/gitlab/gitaly/config] action create
- change owner from 'nobody' to 'root'
- change group from '65533' to 'root'
* directory[/opt/gitlab/sv/gitaly/env] action create (up to date)
* ruby_block[Delete unmanaged env files for gitaly service] action run (skipped due to only_if)

Then after looping for about 1 minute (approx. 50 loops), starting the Gitlab container fails with

the following error message:
There was an error running gitlab-ctl reconfigure:
Multiple failures occurred:
* Chef::Exceptions::MultipleFailures occurred in chef run: Multiple failures occurred:
* SystemStackError occurred in delayed notification: stack level too deep
* SystemStackError occurred in delayed notification: stack level too deep
* Mixlib::ShellOut::ShellCommandFailed occurred in delayed notification: service[gitaly] (dynamically    defined) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /opt/gitlab/embedded/bin/chpst -u root:root /opt/gitlab/embedded/bin/sv restart    /opt/gitlab/service/gitaly ----
STDOUT: fail: /opt/gitlab/service/gitaly: unable to change to service directory: file does not exist
STDERR: 
---- End output of /opt/gitlab/embedded/bin/chpst -u root:root /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ----
Ran /opt/gitlab/embedded/bin/chpst -u root:root /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly returned 1

So it seems that the root cause is “file does not exist” - however, /opt/gitlab/service/gitaly is a softlink to /opt/gitlab/sv/gitaly, which does exist in the image.

Any pointers? Thank you.

We solved our problem and finally managed to update to 11.8! It turned out we ran into a bug with Gitlab 11.8 and NFS: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/4168

A workaround suggested there was to export the volume with the option no_root_squash. Unfortunately, it was not feasible to emulate this option with our storage. Therefore, we migrated all Gitlab data (except backups) to a local disk (simply stopped Gitlab, changed volume mounts in the dockerfile, copied all data with rsync to the new location and started Gitlab up again). Worked perfectly.