(Disaster Recovery Assistance) Self-Hosted CE Instance Possible Data Corruption - Omnibus Unable To Start

Issue
I have my own Self-Hosted GitLab CE instance running within Kubernetes 1.23. I recently had a failure of one of my nodes which GitLab was running on. Unfortunately I was in the process of migrating data around on my NAS, and I didn’t realize I moved the GitLab data onto a ZFS Dataset that did not have snapshots enabled yet. Basically, that GitLab data that I have on my storage server and offsite backup are the one singular version of my data that I have left, so when my node crashed, I believe that it caused some data corruption, because now GitLab is unable to complete the gitlab-ctl reconfigure process.

Setup

Relevant Kubernetes Manifests:

apiVersion: v1
kind: ConfigMap
metadata:
  name: gitlab-configmap-environment
  namespace: personal-4

data:
  GITLAB_OMNIBUS_CONFIG: |
    external_url 'https://mypublicaddress'

    gitlab_rails['gitlab_default_theme'] = 11
    gitlab_rails['gitlab_shell_ssh_port'] = 22 # Defined, but will not be used (HTTPS is easier...)
    gitlab_rails['lfs_enabled'] = true
    gitlab_rails['trusted_proxies'] = [ '10.42.0.0/16' ]

    letsencrypt['enable'] = false
    nginx['real_ip_trusted_addresses'] = [ '10.42.0.0/16' ]
    nginx['listen_port'] = 80 # Using HTTP between the Traefik container and GitLab
    nginx['listen_https'] = false # Traefik is handling HTTPS termination
    nginx['redirect_http_to_https'] = true
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gitlab-statefulset
  namespace: personal-4
  labels:
    app: gitlab

spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitlab
  serviceName: "gitlab-service"
  updateStrategy:
    type: RollingUpdate
  minReadySeconds: 300
  template:
    metadata:
      labels:
        app: gitlab
    spec:
      containers:
      - name: gitlab
        image: docker.io/gitlab/gitlab-ce:14.7.4-ce.0
        # command: [ "sleep", "infinity" ]
        envFrom:
        - configMapRef:
            name: gitlab-configmap-environment
        ports:
        - protocol: TCP
          containerPort: 80
        resources:
          requests:
            cpu: 500m
            memory: 6Gi
          limits:
            cpu: 4000m
            memory: 10Gi
        volumeMounts:
        - name: gitlab-volume-config
          mountPath: /etc/gitlab/
        - name: gitlab-volume-data
          mountPath: /var/opt/gitlab/
        - name: gitlab-volume-logs
          mountPath: /var/log/gitlab/
        - name: gitlab-volume-devsharedmemory
          mountPath: /dev/shm/
      volumes:
      - name: gitlab-volume-devsharedmemory
        emptyDir:
          medium: Memory
          sizeLimit: 256Mi
  volumeClaimTemplates:
  - metadata:
      name: gitlab-volume-config
    spec:
      accessModes:
      - ReadWriteMany
      storageClassName: personal-4-gitlab-storageclass-config
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: gitlab-volume-data
    spec:
      accessModes:
      - ReadWriteMany
      storageClassName: personal-4-gitlab-storageclass-data
      resources:
        requests:
          storage: 5Gi
  - metadata:
      name: gitlab-volume-logs
    spec:
      accessModes:
      - ReadWriteMany
      storageClassName: personal-4-gitlab-storageclass-logs
      resources:
        requests:
          storage: 1Gi

Container Logs

Thank you for using GitLab Docker Image!
Current version: gitlab-ce=14.7.4-ce.0

Configure GitLab for your system by editing /etc/gitlab/gitlab.rb file
And restart this container to reload settings.
To do it use docker exec:

  docker exec -it gitlab editor /etc/gitlab/gitlab.rb
  docker restart gitlab

For a comprehensive list of configuration options please see the Omnibus GitLab readme
https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md

If this container fails to start due to permission problems try to fix it by executing:

  docker exec -it gitlab update-permissions
  docker restart gitlab

Cleaning stale PIDs & sockets
Preparing services...
Starting services...
Configuring GitLab...
/opt/gitlab/embedded/bin/runsvdir-start: line 37: /proc/sys/fs/file-max: Read-only file system
Starting Chef Infra Client, version 15.17.4e[0m
resolving cookbooks for run list: ["gitlab"]e[0m
Synchronizing Cookbooks:e[0m
  - gitlab (0.0.1)e[0m
  - package (0.1.0)e[0m
  - logrotate (0.1.0)e[0m
  - postgresql (0.1.0)e[0m
  - redis (0.1.0)e[0m
  - monitoring (0.1.0)e[0m
  - registry (0.1.0)e[0m
  - mattermost (0.1.0)e[0m
  - consul (0.1.0)e[0m
  - gitaly (0.1.0)e[0m
  - praefect (0.1.0)e[0m
  - gitlab-kas (0.1.0)e[0m
  - gitlab-pages (0.1.0)e[0m
  - letsencrypt (0.1.0)e[0m
  - nginx (0.1.0)e[0m
  - runit (5.1.3)e[0m
  - acme (4.1.3)e[0m
  - crond (0.1.0)e[0m
Installing Cookbook Gems:e[0m
Compiling Cookbooks...e[0m
Recipe: gitlab::defaulte[0m
  * directory[/etc/gitlab] action create (up to date)
  Converging 268 resourcese[0m
  * directory[/etc/gitlab] action create (up to date)
  * directory[Create /var/opt/gitlab] action create (up to date)
  * directory[Create /var/log/gitlab] action create (up to date)
  * directory[/opt/gitlab/embedded/etc] action create
    e[32m- create new directory /opt/gitlab/embedded/etce[0m
    e[32m- change mode from '' to '0755'e[0m
    e[32m- change owner from '' to 'root'e[0m
    e[32m- change group from '' to 'root'e[0m
e[0m  * template[/opt/gitlab/embedded/etc/gitconfig] action create
    e[32m- create new file /opt/gitlab/embedded/etc/gitconfige[0m
    e[32m- update content in file /opt/gitlab/embedded/etc/gitconfig from none to 5a725ae[0m
    e[37m--- /opt/gitlab/embedded/etc/gitconfig	2022-03-13 22:12:52.156954579 +0000e[0m
    e[37m+++ /opt/gitlab/embedded/etc/.chef-gitconfig20220313-31-jvd0oi	2022-03-13 22:12:52.156954579 +0000e[0m
    e[37m@@ -1 +1,17 @@e[0m
    e[37m+[pack]e[0m
    e[37m+  threads = 1e[0m
    e[37m+[receive]e[0m
    e[37m+  fsckObjects = truee[0m
    e[37m+advertisePushOptions = truee[0m
    e[37m+[repack]e[0m
    e[37m+  writeBitmaps = truee[0m
    e[37m+[transfer]e[0m
    e[37m+  hideRefs=^refs/tmp/e[0m
    e[37m+hideRefs=^refs/keep-around/e[0m
    e[37m+hideRefs=^refs/remotes/e[0m
    e[37m+[core]e[0m
    e[37m+  alternateRefsCommand="exit 0 #"e[0m
    e[37m+fsyncObjectFiles = truee[0m
    e[37m+[fetch]e[0m
    e[37m+  writeCommitGraph = truee[0m
    e[32m- change mode from '' to '0755'e[0m
e[0mRecipe: gitlab::web-servere[0m
  * account[Webserver user and group] action create (up to date)

...

      * ruby_block[restart_service] action nothing (skipped due to action :nothing)
      * ruby_block[restart_log_service] action nothing (skipped due to action :nothing)
      * ruby_block[reload_log_service] action nothing (skipped due to action :nothing)
      * directory[/opt/gitlab/sv/grafana] action create (up to date)
      * template[/opt/gitlab/sv/grafana/run] action create (up to date)
      * directory[/opt/gitlab/sv/grafana/log] action create (up to date)
      * directory[/opt/gitlab/sv/grafana/log/main] action create (up to date)
      * template[/opt/gitlab/sv/grafana/log/config] action create (up to date)
      * ruby_block[verify_chown_persisted_on_grafana] action nothing (skipped due to action :nothing)
      * link[/var/log/gitlab/grafana/config] action create (up to date)
      * template[/opt/gitlab/sv/grafana/log/run] action create (up to date)
      * directory[/opt/gitlab/sv/grafana/env] action create (up to date)
      * ruby_block[Delete unmanaged env files for grafana service] action run (skipped due to only_if)
      * template[/opt/gitlab/sv/grafana/check] action create (skipped due to only_if)
      * template[/opt/gitlab/sv/grafana/finish] action create (skipped due to only_if)
      * directory[/opt/gitlab/sv/grafana/control] action create (up to date)
      * link[/opt/gitlab/init/grafana] action create (up to date)
      * file[/opt/gitlab/sv/grafana/down] action nothing (skipped due to action :nothing)
      * directory[/opt/gitlab/service] action create (up to date)
      * link[/opt/gitlab/service/grafana] action create (up to date)
      * ruby_block[wait for grafana service socket] action run (skipped due to not_if)
      e[32m- execute the ruby block restart_log_servicee[0m
e[0m    * directory[/opt/gitlab/service] action create (up to date)
    * link[/opt/gitlab/service/grafana] action create (up to date)
    * ruby_block[wait for grafana service socket] action run (skipped due to not_if)
  
e[0mRecipe: gitlab::database_reindexing_disablee[0m
  * crond_job[database-reindexing] action delete
    * file[/var/opt/gitlab/crond/database-reindexing] action delete (up to date)
     (up to date)
Recipe: gitlab::gitlab-railse[0m
  * execute[clear the gitlab-rails cache] action run
    [execute] rake aborted!
              Errno::EACCES: Permission denied - connect(2) for /var/opt/gitlab/redis/redis.socket
              /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/instrumentation/redis_interceptor.rb:21:in `call'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:17:in `block (6 levels) in <top (required)>'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `loop'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `block (5 levels) in <top (required)>'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `each'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `block (4 levels) in <top (required)>'
              /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `block in with'
              /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `with'
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:10:in `block (3 levels) in <top (required)>'
              /opt/gitlab/embedded/bin/bundle:23:in `load'
              /opt/gitlab/embedded/bin/bundle:23:in `<main>'
              Tasks: TOP => cache:clear => cache:clear:redis
              (See full trace by running task with --trace)
    e[0m
    ================================================================================e[0m
    e[31mError executing action `run` on resource 'execute[clear the gitlab-rails cache]'e[0m
    ================================================================================e[0m
    
e[0m    Mixlib::ShellOut::ShellCommandFailede[0m
    ------------------------------------e[0m
    Expected process to exit with [0], but received '1'
e[0m    ---- Begin output of /opt/gitlab/bin/gitlab-rake cache:clear ----
e[0m    STDOUT: 
e[0m    STDERR: rake aborted!
e[0m    Errno::EACCES: Permission denied - connect(2) for /var/opt/gitlab/redis/redis.socket
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/instrumentation/redis_interceptor.rb:21:in `call'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:17:in `block (6 levels) in <top (required)>'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `loop'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `block (5 levels) in <top (required)>'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `each'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `block (4 levels) in <top (required)>'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `block in with'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `with'
e[0m    /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:10:in `block (3 levels) in <top (required)>'
e[0m    /opt/gitlab/embedded/bin/bundle:23:in `load'
e[0m    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
e[0m    Tasks: TOP => cache:clear => cache:clear:redis
e[0m    (See full trace by running task with --trace)
e[0m    ---- End output of /opt/gitlab/bin/gitlab-rake cache:clear ----
e[0m    Ran /opt/gitlab/bin/gitlab-rake cache:clear returned 1e[0m
    
e[0m    Resource Declaration:e[0m
    ---------------------e[0m
    # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/gitlab-rails.rb
e[0m    
e[0m    450: execute "clear the gitlab-rails cache" do
e[0m    451:   command "/opt/gitlab/bin/gitlab-rake cache:clear"
e[0m    452:   action :nothing
e[0m    453:   not_if { omnibus_helper.not_listening?('redis') || !node['gitlab']['gitlab-rails']['rake_cache_clear'] }
e[0m    454: end
e[0m    455: 
e[0m    
e[0m    Compiled Resource:e[0m
    ------------------e[0m
    # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/gitlab-rails.rb:450:in `from_file'
e[0m    
e[0m    execute("clear the gitlab-rails cache") do
e[0m      action [:nothing]
e[0m      default_guard_interpreter :execute
e[0m      command "/opt/gitlab/bin/gitlab-rake cache:clear"
e[0m      backup 5
e[0m      declared_type :execute
e[0m      cookbook_name "gitlab"
e[0m      recipe_name "gitlab-rails"
e[0m      domain nil
e[0m      user nil
e[0m      not_if { #code block }
e[0m    end
e[0m    
e[0m    System Info:e[0m
    ------------e[0m
    chef_version=15.17.4
e[0m    platform=ubuntu
e[0m    platform_version=20.04
e[0m    ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
e[0m    program_name=/opt/gitlab/embedded/bin/chef-client
e[0m    executable=/opt/gitlab/embedded/bin/chef-cliente[0m
    
e[0mRecipe: gitlab::gitlab-workhorsee[0m
  * runit_service[gitlab-workhorse] action restart (up to date)
Recipe: monitoring::gitlab-exportere[0m
  * runit_service[gitlab-exporter] action restart (up to date)
Recipe: monitoring::redis-exportere[0m
  * runit_service[redis-exporter] action restart (up to date)
Recipe: monitoring::prometheuse[0m
  * runit_service[prometheus] action restart (up to date)
Recipe: monitoring::alertmanagere[0m
  * runit_service[alertmanager] action restart (up to date)
Recipe: monitoring::postgres-exportere[0m
  * runit_service[postgres-exporter] action restart (up to date)
Recipe: monitoring::grafanae[0m
  * runit_service[grafana] action restart (up to date)
e[0m
Running handlers:e[0m
There was an error running gitlab-ctl reconfigure:

execute[clear the gitlab-rails cache] (gitlab::gitlab-rails line 450) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /opt/gitlab/bin/gitlab-rake cache:clear ----
STDOUT: 
STDERR: rake aborted!
Errno::EACCES: Permission denied - connect(2) for /var/opt/gitlab/redis/redis.socket
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/instrumentation/redis_interceptor.rb:21:in `call'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:17:in `block (6 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `loop'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:16:in `block (5 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `each'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:14:in `block (4 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `block in with'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/redis/wrapper.rb:23:in `with'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/cache.rake:10:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Running handlers complete
Tasks: TOP => cache:clear => cache:clear:redis
(See full trace by running task with --trace)
---- End output of /opt/gitlab/bin/gitlab-rake cache:clear ----
Ran /opt/gitlab/bin/gitlab-rake cache:clear returned 1

e[0mChef Infra Client failed. 319 resources updated in 02 minutes 21 secondse[0m

I have tried a few miscellaneous troubleshooting steps, but none seem to have any effect.

I couldn’t easily find the repositories on the persistent storage to be able to just transfer them to a new GitLab instance.

Any assistance with either getting it working, or even just getting my git repositories out would be hugely appreciated!

The error is related to not being able to connect to the redis socket and appears to be a permissions error. Have you investigated that any further? What does sudo gitlab-ctl status give you?

The repository data should be in a location like /var/opt/gitlab/git-data/repositories/@hashed/, but if you have a backup, why not restore it?

Regarding the permissions error being thrown, I’ve tried running gitlab update-permissions, deleting the redis directory (so it would be recreated), and even just setting permissions to 777 on the socket. Unfortunately it didn’t progress the gitlab-ctl reconfigure command any further.

After attempting a gitlab-ctl reconfigure, gitlab-ctl status provides:
image

I have found the hashed repositories, but do not know of a way to reconstruct it.

I cannot restore from a backup because the backup is at the same corrupted state as my current data.

Your gitlab-ctl status output is missing a bunch of services, which I suppose is not too surprising.

Regarding the repos, if you navigate to the bottom of the directory tree in /var/opt/gitlab/git-data/repositories/@hashed/ you will find your bare repositories there, which you should be able to use to push to a fresh GitLab instance (git push --all).