Upgrade from 12.0 to 13.0.1 appears to have broken postgress for my gitlab instance

I have a gitlab instance installed on a centos 7.8 VM. I was having some issues renewing the LE certs via gitlab-ctl reconfigure and noticed a forum suggested upgrading may help - so I ran yum -y update and rebooted - gitlab appeared to come back online fine initially, however it looks like postgresql starts fine, but dies immediately:

[root@GITLAB ~]# gitlab-ctl status
run: alertmanager: (pid 21393) 936s; run: log: (pid 9834) 3492s
run: crond: (pid 21407) 936s; run: log: (pid 9843) 3492s
run: gitaly: (pid 21420) 935s; run: log: (pid 9823) 3492s
run: gitlab-exporter: (pid 21430) 935s; run: log: (pid 9835) 3492s
run: gitlab-workhorse: (pid 21441) 935s; run: log: (pid 9809) 3492s
run: grafana: (pid 21454) 934s; run: log: (pid 9846) 3492s
run: logrotate: (pid 21470) 934s; run: log: (pid 9822) 3492s
run: nginx: (pid 21477) 934s; run: log: (pid 9818) 3493s
run: node-exporter: (pid 21493) 934s; run: log: (pid 9813) 3493s
run: postgres-exporter: (pid 21498) 933s; run: log: (pid 9837) 3493s
**down: postgresql: 1s, normally up, want up; run: log: (pid 9833) 3493s**
run: prometheus: (pid 21614) 927s; run: log: (pid 9825) 3493s
run: redis: (pid 21624) 926s; run: log: (pid 9831) 3493s
run: redis-exporter: (pid 21630) 926s; run: log: (pid 9824) 3493s
run: sidekiq: (pid 25246) 3s; run: log: (pid 9832) 3493s
run: unicorn: (pid 25221) 7s; run: log: (pid 9806) 3493s

The logs suggest there is a version incompatibility:

[root@GITLAB ~]# gitlab-ctl tail postgresql
==> /var/log/gitlab/postgresql/state <==

==> /var/log/gitlab/postgresql/current <==
2020-05-29_11:59:52.94849 FATAL:  database files are incompatible with server
2020-05-29_11:59:52.94850 DETAIL:  The data directory was initialized by PostgreSQL version 10, which is not compatible with this version 11.7.
2020-05-29_11:59:53.97221 FATAL:  database files are incompatible with server
2020-05-29_11:59:53.97222 DETAIL:  The data directory was initialized by PostgreSQL version 10, which is not compatible with this version 11.7.
2020-05-29_11:59:55.00948 FATAL:  database files are incompatible with server

[root@GITLAB ~]# cat /var/opt/gitlab/postgresql/data/PG_VERSION
10
[root@GITLAB ~]#  /opt/gitlab/embedded/bin/pg_ctl --version
pg_ctl (PostgreSQL) 11.7
[root@GITLAB ~]#  /opt/gitlab/embedded/bin/psql --version
psql (PostgreSQL) 11.7

I’ve read some articles suggesting that running gitlab-ctl pg-upgrade could fix this however this prompts for gitlab-ctl reconfigure to be run:

gitlab-ctl pg-upgrade
[root@GITLAB ~]# gitlab-ctl pg-upgrade
Traceback (most recent call last):
        8: from /opt/gitlab/embedded/bin/omnibus-ctl:23:in `<main>'
        7: from /opt/gitlab/embedded/bin/omnibus-ctl:23:in `load'
        6: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/bin/omnibus-ctl:31:in `<top (required)>'
        5: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/lib/omnibus-ctl.rb:746:in `run'
        4: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/lib/omnibus-ctl.rb:204:in `block in add_command_under_category'
        3: from /opt/gitlab/embedded/service/omnibus-ctl/pg-upgrade.rb:115:in `block in load_file'
        2: from /opt/gitlab/embedded/service/omnibus-ctl/lib/gitlab_ctl/util.rb:105:in `roles'
        1: from /opt/gitlab/embedded/service/omnibus-ctl/lib/gitlab_ctl/util.rb:64:in `get_node_attributes'
/opt/gitlab/embedded/service/omnibus-ctl/lib/gitlab_ctl/util.rb:52:in `parse_json_file': Attributes not found in /opt/gitlab/embedded/nodes/gitlab.mydomain.com.json, has reconfigure been run yet? (GitlabCtl::Errors::NodeError)

when I run gitlab-ctl reconfigure it errors due to postgresql being unable to connect:

gitlab-ctl reconfigure

    Compiled Resource:
    ------------------
    # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb:55:in `from_file'

    bash("migrate gitlab-rails database") do
      action [:run]
      default_guard_interpreter :default
      command nil
      backup 5
      interpreter "bash"
      declared_type :bash
      cookbook_name "gitlab"
      recipe_name "database_migrations"
      code "    set -e\n    log_file=\"/var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-$(date +%Y-%m-%d-%H-%M-%S).log\"\n    umask 077\n    /opt/gitlab/bin/gitlab-rake gitlab:db:configure 2>& 1 | tee ${log_file}\n    STATUS=${PIPESTATUS[0]}\n    chown git:git ${log_file}\n    echo $STATUS > /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-74623c80da9\n    exit $STATUS\n"
      domain nil
      user nil
      not_if "(test -f /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-74623c80da9) && (cat /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-74623c80da9 | grep -Fx 0)"
      only_if { #code block }
    end

    System Info:
    ------------
    chef_version=14.14.29
    platform=centos
    platform_version=7.8.2003
    ruby=ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-linux]
    program_name=/opt/gitlab/embedded/bin/chef-client
    executable=/opt/gitlab/embedded/bin/chef-client


Running handlers:
There was an error running gitlab-ctl reconfigure:

bash[migrate gitlab-rails database] (gitlab::database_migrations line 55) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20200529-31793-1fru7vk" ----
STDOUT: rake aborted!
PG::ConnectionBad: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:48:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:db:configure
(See full trace by running task with --trace)
STDERR:
---- End output of "bash"  "/tmp/chef-script20200529-31793-1fru7vk" ----
Ran "bash"  "/tmp/chef-script20200529-31793-1fru7vk" returned 1

Running handlers complete
Chef Client failed. 4 resources updated in 01 minutes 04 seconds

I seem to be going in a loop trying to figure this out. I guess the best way forward is to manually downgrade postgresql? In which case is there any documentation available for doing so?

NB: gitlab-ctl revert-pg-upgrade doesn’t work since 13.01 is shipped with 11.7:

[root@PRODC-GITLAB-SLO1 ~]# gitlab-ctl revert-pg-upgrade -V 10
The specified major version 10 is not supported. Choose from one of 11, 11, 11.
[root@PRODC-GITLAB-SLO1 ~]# gitlab-ctl revert-pg-upgrade
Checking if we need to downgrade: OK
Already running 11.7

Any help is greatly appreciated!

I’ve ommited some logs and config files due to a character limit - if there’s anything specific that would help investigate this please let me know

Postgress related config in gitlab.rb:

gitlab.rb
[root@GITLAB ~]# grep "postgres\|pg" /etc/gitlab/gitlab.rb
##! during upgrades.
# gitlab_rails['backup_pg_schema'] = 'public'
# gitlab_rails['db_adapter'] = "postgresql"
###! Changing any of these settings requires a restart of postgresql.
###! By default, reconfigure reloads postgresql if it is running. If you
###! change any of these settings, be sure to run `gitlab-ctl restart postgresql`
# postgresql['enable'] = true
# postgresql['listen_address'] = nil
# postgresql['port'] = 5432
# postgresql['data_dir'] = "/var/opt/gitlab/postgresql/data"
# postgresql['shared_buffers'] = "256MB"
# postgresql['ha'] = false
# postgresql['dir'] = "/var/opt/gitlab/postgresql"
# postgresql['log_directory'] = "/var/log/gitlab/postgresql"
# postgresql['username'] = "gitlab-psql"
##! `SQL_USER_PASSWORD_HASH` can be generated using the command `gitlab-ctl pg-password-md5 gitlab`
# postgresql['sql_user_password'] = 'SQL_USER_PASSWORD_HASH'
# postgresql['uid'] = nil
# postgresql['gid'] = nil
# postgresql['shell'] = "/bin/sh"
# postgresql['home'] = "/var/opt/gitlab/postgresql"
# postgresql['user_path'] = "/opt/gitlab/embedded/bin:/opt/gitlab/bin:$PATH"
# postgresql['sql_user'] = "gitlab"
# postgresql['max_connections'] = 200
# postgresql['md5_auth_cidr_addresses'] = []
# postgresql['trust_auth_cidr_addresses'] = []
# postgresql['wal_buffers'] = "-1"
# postgresql['autovacuum_max_workers'] = "3"
# postgresql['autovacuum_freeze_max_age'] = "200000000"
# postgresql['log_statement'] = nil
# postgresql['track_activity_query_size'] = "1024"
# postgresql['shared_preload_libraries'] = nil
# postgresql['dynamic_shared_memory_type'] = nil
# postgresql['hot_standby'] = "off"
# See https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html#GUC-SSL-CERT-FILE for more details
# postgresql['ssl'] = 'on'
# postgresql['ssl_ciphers'] = 'HIGH:MEDIUM:+3DES:!aNULL:!SSLv3:!TLSv1'
# postgresql['ssl_cert_file'] = 'server.crt'
# postgresql['ssl_key_file'] = 'server.key'
# postgresql['ssl_ca_file'] = '/opt/gitlab/embedded/ssl/certs/cacert.pem'
# postgresql['ssl_crl_file'] = nil
# postgresql['wal_level'] = "hot_standby"
# postgresql['max_wal_senders'] = 5
# postgresql['max_replication_slots'] = 0
# postgresql['max_locks_per_transaction'] = 128
# postgresql['archive_mode'] = "off"
###! Changing any of these settings only requires a reload of postgresql. You do not need to
###! restart postgresql if you change any of these and run reconfigure.
# postgresql['work_mem'] = "16MB"
# postgresql['maintenance_work_mem'] = "16MB"
# postgresql['checkpoint_segments'] = 10
# postgresql['checkpoint_timeout'] = "5min"
# postgresql['checkpoint_completion_target'] = 0.9
# postgresql['effective_io_concurrency'] = 1
# postgresql['checkpoint_warning'] = "30s"
# postgresql['effective_cache_size'] = "1MB"
# postgresql['shmmax'] =  17179869184 # or 4294967295
# postgresql['shmall'] =  4194304 # or 1048575
# postgresql['autovacuum'] = "on"
# postgresql['log_autovacuum_min_duration'] = "-1"
# postgresql['autovacuum_naptime'] = "1min"
# postgresql['autovacuum_vacuum_threshold'] = "50"
# postgresql['autovacuum_analyze_threshold'] = "50"
# postgresql['autovacuum_vacuum_scale_factor'] = "0.02"
# postgresql['autovacuum_analyze_scale_factor'] = "0.01"
# postgresql['autovacuum_vacuum_cost_delay'] = "20ms"
# postgresql['autovacuum_vacuum_cost_limit'] = "-1"
# postgresql['statement_timeout'] = "60000"
# postgresql['idle_in_transaction_session_timeout'] = "60000"
# postgresql['log_line_prefix'] = "%a"
# postgresql['max_worker_processes'] = 8
# postgresql['max_parallel_workers_per_gather'] = 0
# postgresql['log_lock_waits'] = 1
# postgresql['deadlock_timeout'] = '5s'
# postgresql['track_io_timing'] = 0
# postgresql['default_statistics_target'] = 1000
# postgresql['min_wal_size'] = 80MB
# postgresql['max_wal_size'] = 1GB
# postgresql['archive_command'] = nil
# postgresql['archive_timeout'] = "0"
# postgresql['sql_replication_user'] = "gitlab_replicator"
# postgresql['sql_replication_password'] = "md5 hash of postgresql password" # You can generate with `gitlab-ctl pg-password-md5 <dbuser>`
# postgresql['wal_keep_segments'] = 10
# postgresql['max_standby_archive_delay'] = "30s"
# postgresql['max_standby_streaming_delay'] = "30s"
# postgresql['synchronous_commit'] = on
# postgresql['synchronous_standby_names'] = ''
# postgresql['hot_standby_feedback'] = 'off'
# postgresql['random_page_cost'] = 2.0
# postgresql['log_temp_files'] = -1
# postgresql['log_checkpoints'] = 'off'
# To add custom entries to pg_hba.conf use the following
# postgresql['custom_pg_hba_entries'] = {
# See https://www.postgresql.org/docs/9.6/static/auth-pg-hba-conf.html for an explanation
#  "Upgrade" => "$http_upgrade",
#  "Connection" => "$connection_upgrade"
#   "Upgrade" => "$http_upgrade",
#   "Connection" => "$connection_upgrade"
##! Docs: https://docs.gitlab.com/ce/administration/monitoring/prometheus/postgres_exporter.html
# postgres_exporter['enable'] = true
# postgres_exporter['home'] = '/var/opt/gitlab/postgres-exporter'
# postgres_exporter['log_directory'] = '/var/log/gitlab/postgres-exporter'
# postgres_exporter['flags'] = {}
# postgres_exporter['listen_address'] = 'localhost:9187'
##! Docs: https://docs.gitlab.com/ee/administration/monitoring/prometheus/pgbouncer_exporter.html
# pgbouncer_exporter['enable'] = false
# pgbouncer_exporter['log_directory'] = "/var/log/gitlab/pgbouncer-exporter"
# pgbouncer_exporter['listen_address'] = 'localhost:9188'
# geo_secondary['db_adapter'] = "postgresql"
# geo_secondary['db_host'] = "/var/opt/gitlab/geo-postgresql"
# geo_postgresql['enable'] = false
# geo_postgresql['ha'] = false
# geo_postgresql['dir'] = '/var/opt/gitlab/geo-postgresql'
# geo_postgresql['data_dir'] = '/var/opt/gitlab/geo-postgresql/data'
# geo_postgresql['pgbouncer_user'] = nil
# geo_postgresql['pgbouncer_user_password'] = nil
# See [GitLab PgBouncer documentation](http://docs.gitlab.com/omnibus/settings/database.html#enabling-pgbouncer-ee-only)
# See the [PgBouncer page](https://pgbouncer.github.io/config.html) for details
# pgbouncer['enable'] = false
# pgbouncer['log_directory'] = '/var/log/gitlab/pgbouncer'
# pgbouncer['data_directory'] = '/var/opt/gitlab/pgbouncer'
# pgbouncer['listen_addr'] = '0.0.0.0'
# pgbouncer['listen_port'] = '6432'
# pgbouncer['pool_mode'] = 'transaction'
# pgbouncer['server_reset_query'] = 'DISCARD ALL'
# pgbouncer['application_name_add_host'] = '1'
# pgbouncer['max_client_conn'] = '2048'
# pgbouncer['default_pool_size'] = '100'
# pgbouncer['min_pool_size'] = '0'
# pgbouncer['reserve_pool_size'] = '5'
# pgbouncer['reserve_pool_timeout'] = '5.0'
# pgbouncer['server_round_robin'] = '0'
# pgbouncer['log_connections'] = '0'
# pgbouncer['server_idle_timeout'] = '30'
# pgbouncer['dns_max_ttl'] = '15.0'
# pgbouncer['dns_zone_check_period'] = '0'
# pgbouncer['dns_nxdomain_ttl'] = '15.0'
# pgbouncer['admin_users'] = %w(gitlab-psql postgres pgbouncer)
# pgbouncer['stats_users'] = %w(gitlab-psql postgres pgbouncer)
# pgbouncer['ignore_startup_parameters'] = 'extra_float_digits'
# pgbouncer['databases'] = {
# pgbouncer['logfile'] = nil
# pgbouncer['unix_socket_dir'] = nil
# pgbouncer['unix_socket_mode'] = '0777'
# pgbouncer['unix_socket_group'] = nil
# pgbouncer['auth_type'] = 'md5'
# pgbouncer['auth_hba_file'] = nil
# pgbouncer['auth_query'] = 'SELECT username, password FROM public.pg_shadow_lookup($1)'
# pgbouncer['users'] = {
# postgresql['pgbouncer_user'] = nil
# postgresql['pgbouncer_user_password'] = nil
# pgbouncer['server_reset_query_always'] = 0
# pgbouncer['server_check_query'] = 'select 1'
# pgbouncer['server_check_delay'] = 30
# pgbouncer['max_db_connections'] = nil
# pgbouncer['max_user_connections'] = nil
# pgbouncer['syslog'] = 0
# pgbouncer['syslog_facility'] = 'daemon'
# pgbouncer['syslog_ident'] = 'pgbouncer'
# pgbouncer['log_disconnections'] = 1
# pgbouncer['log_pooler_errors'] = 1
# pgbouncer['stats_period'] = 60
# pgbouncer['verbose'] = 0
# pgbouncer['server_lifetime'] = 3600
# pgbouncer['server_connect_timeout'] = 15
# pgbouncer['server_login_retry'] = 15
# pgbouncer['query_timeout'] = 0
# pgbouncer['query_wait_timeout'] = 120
# pgbouncer['client_idle_timeout'] = 0
# pgbouncer['client_login_timeout'] = 60
# pgbouncer['autodb_idle_timeout'] = 3600
# pgbouncer['suspend_timeout'] = 10
# pgbouncer['idle_transaction_timeout'] = 0
# pgbouncer['pkt_buf'] = 4096
# pgbouncer['listen_backlog'] = 128
# pgbouncer['sbuf_loopcnt'] = 5
# pgbouncer['max_packet_size'] = 2147483647
# pgbouncer['tcp_defer_accept'] = 0
# pgbouncer['tcp_socket_buffer'] = 0
# pgbouncer['tcp_keepalive'] = 1
# pgbouncer['tcp_keepcnt'] = 0
# pgbouncer['tcp_keepidle'] = 0
# pgbouncer['tcp_keepintvl'] = 0
# pgbouncer['disable_pqexec'] = 0
# pgbouncer['client_tls_sslmode'] = 'disable'
# pgbouncer['client_tls_ca_file'] = nil
# pgbouncer['client_tls_key_file'] = nil
# pgbouncer['client_tls_cert_file'] = nil
# pgbouncer['client_tls_protocols'] = 'all'
# pgbouncer['client_tls_dheparams'] = 'auto'
# pgbouncer['client_tls_ecdhcurve'] = 'auto'
# pgbouncer['server_tls_sslmode'] = 'disable'
# pgbouncer['server_tls_ca_file'] = nil
# pgbouncer['server_tls_key_file'] = nil
# pgbouncer['server_tls_cert_file'] = nil
# pgbouncer['server_tls_protocols'] = 'all'
# pgbouncer['server_tls_ciphers'] = 'fast'
# repmgr['pg_bindir'] = '/opt/gitlab/embedded/bin'
# repmgr['service_start_command'] = '/opt/gitlab/bin/gitlab-ctl start postgresql'
# repmgr['service_stop_command'] = '/opt/gitlab/bin/gitlab-ctl stop postgresql'
# repmgr['service_reload_command'] = '/opt/gitlab/bin/gitlab-ctl hup postgresql'
# repmgr['service_restart_command'] = '/opt/gitlab/bin/gitlab-ctl restart postgresql'
# repmgr['promote_command'] = '/opt/gitlab/embedded/bin/repmgr standby promote -f /var/opt/gitlab/postgresql/repmgr.conf'
# repmgr['follow_command'] = '/opt/gitlab/embedded/bin/repmgr standby follow -f /var/opt/gitlab/postgresql/repmgr.conf'
#   'postgresql' => {
#       'name' => "postgresql",
#           'script' => "/var/opt/gitlab/consul/scripts/check_postgresql",
#   'postgresql' => {
#     handler: 'failover_pgbouncer'

I’m just prepping for this step - maybe this could help.
upgrade-packaged-postgresql-server

Thanks :slight_smile: Wish I stumbled upon this earlier!

Naively since I’ve upgraded this instance numerous times in the past I stormed ahead without any research… I feel this is going to be a painful lesson learned!

1 Like

When youre able to recover your instance it would be very neat when you share the steps you took - Who knows if I need it later this day :wink:

1 Like

Same here… Where you able to fix this?

I restored from a VM backup in the end to get this running quickly, however I kept the broken image to see if I could fix it.

I’m still trying to resolve this by restoring the DB files to a postgresql 10.0 DB -> doing an sql dump -> restoring the sql dump to a blank gitlab DB. Steps I plan to take or have already taken are below.

I will update this post when I have a full method for a fix.

get gitlab running with a blank DB again
  • move the old data dir

    mv /var/opt/gitlab/postgresql/data/ /var/opt/gitlab/postgresql/data.old/
    
  • recreate it with nothing inside

    mkdir /var/opt/gitlab/postgresql/data

  • init the db using gitlab-psql user

      su gitlab-psql -c  "/opt/gitlab/embedded/bin/initdb -D /var/opt/gitlab/postgresql/data"
    
  • copy the certificates and runtime conf to the new blank db

    cp -p /var/opt/gitlab/postgresql/data.old/server.* /var/opt/gitlab/postgresql/data/ && cp -p /var/opt/gitlab/postgresql/data.old/runtime.conf /var/opt/gitlab/postgresql/data/
  • fix permissions

    chmod 750 /var/opt/gitlab/postgresql/data && chown -R gitlab-psql:gitlab-psql /var/opt/gitlab/postgresql/data
  • rake

    gitlab-rake gitlab:setup

  • reconfigure

    gitlab-ctl reconfigure

  • restart gitlab
    gitlab-ctl restart

sql dump of original db
  • copy db files to wherever you have installed postgres10

    cp -Rp /var/opt/gitlab/postgresql/data /var/lib/pgsql/10/data/

  • open db to ensure it works

    psql -d gitlabhq_production
      psql (10.7)
      Type "help" for help.
    

    gitlabhq_production=#

  • sql dump

    pg_dump gitlabhq_production -U gitlab-psql -Fc /tmp/gitlabhq_production.sql

restore DB files to 11.7 db

This won’t work as there are differences in the db structure with the 13.01 upgrade:
see here for more info: https://gitlab.com/gitlab-org/gitlab/-/issues/219001

su gitlab-psql  -c "/opt/gitlab/embedded/bin/pg_restore -U gitlab-psql -d gitlabhq_production  -w /tmp/gitlabhq_production.sql

hopefully I’ll find a solid resolution soon…

bummer.
you can try this:


It meight be that you need a reconfigure between the updates.