Replace this template with your information
I deployed Gitlab via helm chart. After a power loss shut down the cluster, everything but gitlab started just fine. Since I couldn’t figure out the exact reason why it wasn’t starting, I decided to update it, so ran helm upgrade gitlab ...othervars
and upgraded the deployment.
But the migration is failing because redis isn’t starting properly.
Migration logs: (shortened to remove full stack trace. Full trace available here)
Begin parsing .erb files from /var/opt/gitlab/templates
Writing /srv/gitlab/config/resque.yml
Writing /srv/gitlab/config/gitlab.yml
Writing /srv/gitlab/config/database.yml
Copying other config files found in /var/opt/gitlab/templates
Attempting to run '/scripts/wait-for-deps /scripts/db-migrate' as a main process
Checking database connection and schema version
Database Schema - current: 20200221142216, codebase: 20200325152327
Checking database migrations are up-to-date
Performing migrations (this will initialized if needed)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
Error connecting to Redis on gitlab-redis-master:6379 (Redis::TimeoutError)
...
/srv/gitlab/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <main>'
Caused by:
Redis::CannotConnectError: Error connecting to Redis on gitlab-redis-master:6379 (Redis::TimeoutError)
...
/srv/gitlab/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <main>'
Caused by:
Redis::TimeoutError: Redis::TimeoutError
...
/srv/gitlab/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <main>'
Caused by:
IO::EINPROGRESSWaitWritable: Operation now in progress - connect(2) would block
...
/srv/gitlab/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <main>'
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
== 20200221144534 DropActivatePrometheusServicesBackgroundJobs: migrating =====
gitlab-redis-master-0 pod:
metrics container (running):
time="2020-04-07T02:03:01Z" level=info msg="Redis Metrics Exporter v1.3.5 build date: 2019-12-16-18:43:41 sha1: 14dda66e724e45935782db610aca803594107ff0 Go: go1.13.5 GOOS: linux GOARCH: amd64"
time="2020-04-07T02:03:01Z" level=info msg="Providing metrics at :9121/metrics"
time="2020-04-07T02:03:07Z" level=error msg="Couldn't connect to redis instance"
time="2020-04-07T02:04:07Z" level=error msg="Couldn't connect to redis instance"
....repeats
gitlab-redis container (CrashLoopBackOff):
02:39:37.36 INFO ==> ** Starting Redis **
1:C 07 Apr 2020 02:39:37.379 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 07 Apr 2020 02:39:37.379 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 07 Apr 2020 02:39:37.379 # Configuration loaded
1:M 07 Apr 2020 02:39:37.382 * Running mode=standalone, port=6379.
1:M 07 Apr 2020 02:39:37.382 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 07 Apr 2020 02:39:37.382 # Server initialized
1:M 07 Apr 2020 02:39:37.382 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 07 Apr 2020 02:39:37.383 * Reading RDB preamble from AOF file...
1:M 07 Apr 2020 02:39:37.392 * Reading the remaining AOF tail...
1:M 07 Apr 2020 02:39:38.191 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
values.yaml for helm chart:
# helm install -n gitlab gitlab gitlab/gitlab -f manifests/gitlab/values.yml
# helm upgrade -n gitlab gitlab gitlab/gitlab -f manifests/gitlab/values.yml
global:
edition: ee
hosts:
domain: example.com
https: false
gitlab:
name: gitlab.example.com
https: true
minio:
name: minio.example.com
https: false
registry:
name: cr.example.com
https: true
ingress:
configureCertmanager: false
class: nginx
enabled: true
tls:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
kubernetes.io/tls-acme: true
nginx.ingress.kubernetes.io/proxy-body-size: 512m
nginx.ingress.kubernetes.io/proxy-connect-timeout: 15
gitaly:
persistence:
size: 8Gi
psql:
host: gitlab-postgres-postgresql
database: gitlab
user: gitlab
password:
secret: gitlab-postgresql
key: postgres-gitlab-password
minio:
enabled: true
grafana:
enabled: false
appConfig:
ldap:
servers:
main:
label: 'LDAP'
host: 'ipa.example.com'
port: 389
uid: 'uid'
base: 'asdf'
active_directory: 'false'
attributes:
email: ['mail', 'email']
bind_dn: 'asdf'
password:
secret: ldap-bind
key: ldap-password
encryption: 'plain'
registry:
enabled: true
bucket: registry
gitlab:
migrations:
enabled: true
unicorn:
ingress:
tls:
secretName: gitlab-unicorn-tls
upgradeCheck:
enabled: false
certmanager:
install: false
nginx-ingress:
enabled: false
prometheus:
install: true
redis:
persistence:
size: 1Gi
postgresql:
install: false
registry:
enabled: true
ingress:
enabled: true
tls:
enabled: true
secretName: gitlab-registry-tls
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
kubernetes.io/tls-acme: true
nginx.ingress.kubernetes.io/proxy-body-size: 512m
nginx.ingress.kubernetes.io/proxy-connect-timeout: 15
gitlab-runner:
install: true
privileged: true
rbac:
create: true
runners:
locked: false
privileged: true
minio:
persistence:
size: 10Gi
gitaly:
persistence:
size: 8Gi
How can I fix Redis to launch properly?