GitLab 14.9.3 no access to repos anymore - Gitaly issue?

Hi,

after some server problems our dockerized GitLab-CE (14.9.3 self-managed) seems to have some serious issues. Currently you’re able to login (WebUI etc works),
but if you’re trying to access any project you receive a HTTP 500.

We’ve already made plenty use of update-permissions and gitlab-ctl reconfigure, but the problem still persists. Tried to recreate the container, but nothing changed.

Currently it seems a problem with gitaly based on the log entries, what makes me wonder why it complains about missing sockets (so i would estimate a access right problem), but
this doesn’t change after using update-permissions and restarting the container.

==> /var/log/gitlab/gitaly/current <==
{"level":"info","msg":"License database preloaded","time":"2023-03-08T14:50:35.485Z"}
{"level":"warning","msg":"git path not configured. Using default path resolution","resolvedPath":"/opt/gitlab/embedded/bin/git","time":"2023-03-08T14:50:35.485Z"}
{"level":"info","msg":"clearing disk cache object folder","storage":"default","time":"2023-03-08T14:50:35.489Z"}
{"level":"info","msg":"moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache1740825106","storage":"default","time":"2023-03-08T14:50:35.489Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2023-03-08T14:50:35.489Z","time_ms":1}
{"level":"info","msg":"disk cache object folder doesn't exist, no need to remove","storage":"default","time":"2023-03-08T14:50:35.490Z"}
{"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache","path":"/var/opt/gitlab/git-data/repositories/+gitaly/cache","time":"2023-03-08T14:50:35.490Z"}
{"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state","path":"/var/opt/gitlab/git-data/repositories/+gitaly/state","time":"2023-03-08T14:50:35.490Z"}
{"level":"info","msg":"cleared all cache object files in /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache1740825106 after 366.165µs","storage":"default","time":"2023-03-08T14:50:35.490Z"}
{"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {/var/opt/gitlab/gitaly/internal_sockets/ruby.1 /var/opt/gitlab/gitaly/internal_sockets/ruby.1 \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab/gitaly/internal_sockets/ruby.1: connect: no such file or directory\". Reconnecting...","pid":8479,"system":"system","time":"2023-03-08T14:50:36.234Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.1","supervisor.rss_threshold":209715200,"time":"2023-03-08T14:50:36.234Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","8479","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","supervisor.pid":8501,"time":"2023-03-08T14:50:36.234Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.0","supervisor.rss_threshold":209715200,"time":"2023-03-08T14:50:36.234Z"}
{"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {/var/opt/gitlab/gitaly/internal_sockets/ruby.0 /var/opt/gitlab/gitaly/internal_sockets/ruby.0 \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab/gitaly/internal_sockets/ruby.0: connect: no such file or directory\". Reconnecting...","pid":8479,"system":"system","time":"2023-03-08T14:50:36.235Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","8479","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","supervisor.pid":8503,"time":"2023-03-08T14:50:36.235Z"}
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2023-03-08T14:50:36.248Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal_8479.sock","level":"info","msg":"listening at unix address","time":"2023-03-08T14:50:36.249Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","8479","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2023-03-08T14:50:36.252Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","8479","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2023-03-08T14:50:36.254Z"}
{"error":"unable to start the bootstrap: can't create new listener: listen tcp: lookup localhost on [::1]:53: dial udp [::1]:53: connect: cannot assign requested address","level":"error","msg":"shutting down","time":"2023-03-08T14:50:36.255Z"}
{"level":"info","msg":"Gitaly stopped","time":"2023-03-08T14:50:36.255Z"}
{"level":"warning","msg":"forwarding signal","pid":8473,"process":8479,"signal":17,"time":"2023-03-08T14:50:36.260Z","wrapper":8473}
{"error":"os: process already finished","level":"error","msg":"can't forward the signal","pid":8473,"process":8479,"signal":17,"time":"2023-03-08T14:50:36.260Z","wrapper":8473}
{"level":"error","msg":"wrapper for process shutting down","pid":8473,"process":8479,"time":"2023-03-08T14:50:36.289Z","wrapper":8473}
{"level":"info","msg":"Wrapper started","pid":8504,"time":"2023-03-08T14:50:36.301Z","wrapper":8504}
{"level":"info","msg":"finding process","pid":8504,"pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2023-03-08T14:50:36.301Z","wrapper":8504}
{"error":"open /var/opt/gitlab/gitaly/gitaly.pid: no such file or directory","level":"error","msg":"find process","pid":8504,"time":"2023-03-08T14:50:36.301Z","wrapper":8504}
{"level":"info","msg":"spawning a process","pid":8504,"time":"2023-03-08T14:50:36.302Z","wrapper":8504}
{"level":"info","msg":"monitoring process","pid":8504,"process":8510,"time":"2023-03-08T14:50:36.302Z","wrapper":8504}
time="2023-03-08T14:50:36Z" level=info msg="Starting GitalyversionGitaly, version 14.9.3"
{"latencies":[0.001,0.005,0.025,0.1,0.5,1,10,30,60,300,1500],"level":"info","msg":"grpc prometheus histograms enabled","time":"2023-03-08T14:50:36.953Z"}

Maybe anybody has any idea why this could occur?

Thanks in advance

Please share the command / configuration how the container is started, and the base OS versions, and container engines involved. Maybe the is a problem with filesystem mounts, sockets, SELinux, etc. that causes this behaviour.

Hi,

thanks for helping. The base specs are:

OS: OpenSuSE Leap 15.2 - SELinux disabled
Docker: 19.03.11, build 42e35e61f352

The container is started via docker-compose (it’s now 14.10.5)

web:
  image: 'gitlab/gitlab-ce:14.10.5-ce.0'
  restart: "no" 
  hostname: 'gitlab-old.lan'
  environment:
    GITLAB_SKIP_UNMIGRATED_DATA_CHECK: "true"
    GITLAB_OMNIBUS_CONFIG: |
      external_url 'https://gitlab-old.lan'
      gitlab_rails['gitlab_shell_ssh_port'] = 222
      gitlab_rails['smtp_enable'] = true;
      gitlab_rails['smtp_address'] = '<IP>';
      gitlab_rails['smtp_port'] = 25;
      gitlab_rails['smtp_domain'] = '<DOMAIN>';
      gitlab_rails['smtp_tls'] = false;
      gitlab_rails['smtp_openssl_verify_mode'] = 'none'
      gitlab_rails['smtp_enable_starttls_auto'] = false
      gitlab_rails['smtp_ssl'] = false
      gitlab_rails['smtp_force_ssl'] = false
  ports:
    - '172.16.1.74:443:443'
    - '222:22'
    - '5000:5000'
    - '5555:5555'
  volumes:
    - '/web/shared/docker/gitlab-old/config:/etc/gitlab'
    - '/web/shared/docker/gitlab-old/logs:/var/log/gitlab'
    - '/web/shared/docker/gitlab-old/data:/var/opt/gitlab'

gitlab dir

total 16
drwxrwxrwx 1  998  998   0 Jun 20  2022 .bundle
-rw-r--r-- 1  998  998 371 Jun 20  2022 .gitconfig
drwx------ 1  998  998 202 Jun 20  2022 .ssh
drwxrwxrwx 1  992  992  40 Jun 20  2022 alertmanager
drwx------ 1  998 root 460 Jun 20  2022 backups
-rwxrwxrwx 1 root root  38 Jun 20  2022 bootstrapped
drwxrwxrwx 1 root root   0 Jun 20  2022 crond
drwx------ 1  998  998  24 Jun 20  2022 git-data
drwx------ 1  998  998  98 Mar  9 11:51 gitaly   <--- seems right
drwxr-xr-x 1  998 root  12 Jun 20  2022 gitlab-ci
drwxrwxrwx 1 root root 108 Jun 20  2022 gitlab-exporter
drwxrwxrwx 1  998 root 184 Mar  8 09:46 gitlab-kas
drwxrwxrwx 1 root root 104 Jun 20  2022 gitlab-monitor
drwxr-xr-x 1  998  998 148 Mar  9 11:35 gitlab-rails
drwx------ 1  998  998  30 Mar  9 11:35 gitlab-shell
drwxr-x--- 1  998  999  50 Mar  9 11:35 gitlab-workhorse
drwxrwxrwx 1 root root  74 Jun 20  2022 grafana
drwx------ 1 root root  82 Mar  9 11:45 logrotate
drwxr-x--- 1 root  999 176 Mar  9 11:35 nginx
drwxrwxrwx 1 root root  36 Jun 20  2022 node-exporter
drwxrwxrwx 1 root root  24 Jun 20  2022 postgres-exporter
drwxrwxrwx 1  996  996  24 Jun 20  2022 postgresql
drwxrwxrwx 1  992  992  46 Jun 20  2022 prometheus
-rwxrwxrwx 1 root root 170 Mar  9 11:35 public_attributes.json
drwxr-x--- 1  997  998  60 Mar  9 11:50 redis
drwxrwxrwx 1  993 root  72 Jun 20  2022 registry
drwxrwxrwx 1 root root   0 Jun 20  2022 test
-rwxrwxrwx 1 root root  40 Mar  9 09:56 trusted-certs-directory-hash

gitaly dir

total 12
-rwxrwxrwx 1  998 998   63 Jun 20  2022 RUBY_VERSION
-rwxrwxrwx 1  998 998   72 Mar  9 09:56 VERSION
-rw-r----- 1 root 998 1039 Mar  9 09:56 config.toml
srwxr-xr-x 1  998 998    0 Mar  9 11:52 gitaly.socket
drwx------ 1  998 998  250 Mar  9 11:52 run

Strange are the follwing messages (seems gitaly isn´t runnig or couldn’t create pid file)

{"level":"info","msg":"Wrapper started","pid":6276,"time":"2023-03-09T10:56:59.513Z","wrapper":6276}
{"level":"info","msg":"finding process","pid":6276,"pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2023-03-09T10:56:59.513Z","wrapper":6276}
{"error":"open /var/opt/gitlab/gitaly/gitaly.pid: no such file or directory","level":"error","msg":"find process","pid":6276,"time":"2023-03-09T10:56:59.513Z","wrapper":6276}
{"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {/var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.0 /var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.0 \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.0: connect: no such file or directory\". Reconnecting...","pid":6250,"system":"system","time":"2023-03-09T10:56:58.676Z"}
{"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {/var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.1 /var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.1 \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab/gitaly/run/gitaly-6250/sock.d/ruby.1: connect: no such file or directory\". Reconnecting...","pid":6250,"system":"system","time":"2023-03-09T10:56:58.676Z"}

Additional info: when i’m creating the pid file manually, the following message occurs

{"level":"info","msg":"finding process","pid":10860,"pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2023-03-09T11:13:51.117Z","wrapper":10860}
{"error":"strconv.Atoi: parsing \"\": invalid syntax","level":"error","msg":"find process","pid":10860,"time":"2023-03-09T11:13:51.117Z","wrapper":10860}

Which seems to find the pid file, which is empty and the process coulnd’t write it’s pid into the file?
Normally i would say this should be a problem with the access rights, but user git has full access to the dir.

SANITIZE=true gitlab-rake gitlab:check inside of container shows:

Gitaly: ... default ... FAIL: 14:failed to connect to all addresses. debug_error_string:{"created":"@1678361727.968115173","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3093,"referenced_errors":[{"created":"@1678361727.968113396","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}

Which could be a result of the missing pid file and not starting gitaly processes?

Hi @ANobbe :wave:

Thanks for sharing the additional output and details.

There appears to be an issue with ownership and permissions. Running docker exec -it gitlab ls -al /var/opt/gitlab should return items with username/group ownership, but I see a mixture of username/groupname and UIDs/GIDs there.

Can you try following the steps here to fix the permissions and verify if the problem persists? https://docs.gitlab.com/ee/install/docker.html#permission-problems

sudo docker exec gitlab update-permissions
sudo docker-compose restart gitlab
1 Like