Gitaly connection refused when pushing to origin

I’m running a Gitlab CE installation on Docker, in a homelab setup. It’s been working well for quite some time but lately I have been running into issues. I am no longer able to push commits back to the origin, but clones and pulls work fine.

An example symptom can be found here:

git push                            
Enumerating objects: 16, done.
Counting objects: 100% (16/16), done.
Delta compression using up to 8 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 4.66 KiB | 2.33 MiB/s, done.
Total 9 (delta 5), reused 0 (delta 0)
remote: error executing git hook
To ssh://gitlab.mydomain.com/some-group/proxy.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@gitlab.mydomain.com/some-group/proxy.git'

Looking at the container logs, I see this error from gitaly at the time:

==> /var/log/gitlab/gitaly/gitaly_hooks.log <==
time="2020-07-10T15:14:22Z" level=fatal msg="error when getting preReceiveHookStream client for \"pre-receive\": rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix ///var/opt/gitlab/gitaly/internal_sockets/internal.sock: connect: connection refused\""

Full disclosure - my storage volumes are mapped to NFS mounts from a NAS on my network, and I see from the release notes that NFS support is being removed in version 14. However this is a v13 setup, as can be seen below:

# gitlab-rake gitlab:check
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 13.3.0 ? ... OK (13.3.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes
Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ... 
(project list)
Redis version >= 4.0.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.6)
Git version >= 2.22.0 ? ... yes (2.27.0)
Git user has default SSH configuration? ... yes
Active users: ... 2
Is authorized keys file accessible? ... yes
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes

Checking GitLab App ... Finished


Checking GitLab subtasks ... Finished

# gitlab-rake gitlab:env:info

System information
System:		
Current User:	git
Using RVM:	no
Ruby Version:	2.6.6p146
Gem Version:	2.7.10
Bundler Version:1.17.3
Rake Version:	12.3.3
Redis Version:	5.0.9
Git Version:	2.27.0
Sidekiq Version:5.2.7
Go Version:	unknown

GitLab information
Version:	13.1.4
Revision:	18c5ab32b73
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	11.7
URL:		https://gitlab.mydomain.com
HTTP Clone URL:	https://gitlab.mydomain.com/some-group/some-project.git
SSH Clone URL:	ssh://git@gitlab.mydomain.com/some-group/some-project.git
Using LDAP:	no
Using Omniauth:	yes
Omniauth Providers: 

GitLab Shell
Version:	13.3.0
Repository storage paths:
- default: 	/var/opt/gitlab/git-data/repositories
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell
Git:		/opt/gitlab/embedded/bin/git

I’ve looked extensively for others’ experiences but hit a dead-end, which leaves me to think there’s something odd about my setup, but I’m struggling to see it. As I said, I’m running NFS mounts but they have been working fine for at least 6 months.

Any help troubleshooting greatly appreciated.

It’s not a solution, because I didn’t ever get to understand the underlying cause (more verbose logging please), but stopping all gitlab services (gitlab-ctl stop), deleting the socket file rm /var/opt/gitlab/gitaly/internal_sockets/internal.sock and restarting gitlab (gitlab-ctl start) created a new socket seems to be a workaround.

btw I have narrowed down how to replicate this. My gitlab-ce instance is started from a docker-compose. If I pull a newer image and then do a docker-compose up it restarts the gitlab instance and this causes the socket to stop working.

I then need to do the following commands in the instance to get it working again.

docker exec -it gitlab bash
gitlab-ctl reconfigure
gitlab-ctl stop
rm /var/opt/gitlab/gitaly/internal_sockets/*
gitlab-ctl start