GitLab 11.4.0 encountering issues with PostgreSQL after OS upgrade to RHEL 7.6

We recently upgraded the OS:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)…

Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it’s own glibc, so it is using the one from the OS.

ERROR: canceling statement due to statement timeout

STATEMENT:
SELECT relnamespace::regnamespace as schemaname,
relname as relname,
pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = ‘r’;

The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.

I checked the statement timeout set on the database.

gitlabhq_production=# show statement_timeout;
 statement_timeout
-------------------
 1min
(1 row)

We see these errors occasionaly. Sometimes they disappear on their own but sometimes I see these messages continuously and then the GUI goes down. We keep getting 502s. GitLab restarts or even container restarts don’t help. Only a server reboot helps.

Checked pg_stat_activity and don’t see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.

Ran \d pg_class to check whether the table uses any indexes and also to check the string column.

gitlabhq_production=# \d pg_class
         Table "pg_catalog.pg_class"
       Column        |   Type    | Modifiers
---------------------+-----------+-----------
 relname             | name      | not null
 relnamespace        | oid       | not null
 reltype             | oid       | not null
 reloftype           | oid       | not null
 relowner            | oid       | not null
 relam               | oid       | not null
 relfilenode         | oid       | not null
 reltablespace       | oid       | not null
 relpages            | integer   | not null
 reltuples           | real      | not null
 relallvisible       | integer   | not null
 reltoastrelid       | oid       | not null
 relhasindex         | boolean   | not null
 relisshared         | boolean   | not null
 relpersistence      | "char"    | not null
 relkind             | "char"    | not null
 relnatts            | smallint  | not null
 relchecks           | smallint  | not null
 relhasoids          | boolean   | not null
 relhaspkey          | boolean   | not null
 relhasrules         | boolean   | not null
 relhastriggers      | boolean   | not null
 relhassubclass      | boolean   | not null
 relrowsecurity      | boolean   | not null
 relforcerowsecurity | boolean   | not null
 relispopulated      | boolean   | not null
 relreplident        | "char"    | not null
 relfrozenxid        | xid       | not null
 relminmxid          | xid       | not null
 relacl              | aclitem[] |
 reloptions          | text[]    |
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

Did a reindex of all system catalog and all database tables but still seeing the timeout issue. Any ideas on what might have gone wrong and what can be done to fix this? GitLab CE v11.4 and RHEL 7.6

Hey @faisalchishtii, could you please set the timeout to 120 seconds and check what happens?

Hi @dsumenkovic

Tried that and still getting the same statement timeout error. We figured out that it is high cpu load that is causing this.

Here are the top 30 processes that are running:

Top 30 processes:

%CPU PID USER COMMAND
6.2 3229 root /usr/lib/systemd/systemd-journald
3.6 31498 polkitd unicorn worker[0] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
2.5 27391 root /usr/bin/topbeat -c /etc/topbeat/topbeat.yml
2.0 7866 polkitd /opt/gitlab/embedded/bin/rake gitlab:backup:create
1.7 20887 polkitd tar -cf - repositories db uploads.tar.gz builds.tar.gz artifacts.tar.gz pages.tar.gz lfs.tar.gz backup_information.yml
1.6 11323 polkitd /opt/gitlab/embedded/bin/gitaly /var/opt/gitlab/gitaly/config.toml
1.5 18526 root [kworker/6:2H]
1.1 31103 polkitd unicorn worker[2] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
1.0 6664 root /usr/bin/dockerd --selinux-enabled -s btrfs --icc=false
1.0 10755 polkitd sidekiq 5.2.1 gitlab-rails [0 of 25 busy]
0.9 704 polkitd unicorn worker[7] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.8 32487 polkitd unicorn worker[8] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.7 29593 polkitd unicorn worker[4] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.6 30969 opssupp java -jar /home/jenkins/agent.jar
0.6 30968 opssupp java -jar /home/jenkins/agent.jar
0.6 30788 polkitd unicorn worker[1] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.6 30671 polkitd unicorn worker[3] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.6 22428 polkitd unicorn worker[6] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.6 11340 992 /opt/gitlab/embedded/bin/prometheus -web.listen-address=localhost:9090 -storage.local.path=/var/opt/gitlab/prometheus/data -storage.local.chunk-encoding-version=2 -storage.local.target-heap-size=697552158 -config.file=/var/opt/gitlab/prometheus/prometheus.yml
0.6 11337 polkitd /opt/gitlab/embedded/bin/gitlab-workhorse -listenNetwork unix -listenUmask 0 -listenAddr /var/opt/gitlab/gitlab-workhorse/socket -authBackend http://localhost:8080 -authSocket /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket -documentRoot /opt/gitlab/embedded/service/gitlab-rails/public -pprofListenAddr -prometheusListenAddr localhost:9229 -secretPath /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret -config config.toml
0.5 6662 root /usr/sbin/rsyslogd -n
0.5 31106 polkitd unicorn worker[5] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
0.5 11331 chrony /opt/gitlab/embedded/bin/redis-server 127.0.0.1:0
0.4 11643 996 postgres: gitlab-psql postgres [local] SELECT
0.3 11339 polkitd /opt/gitlab/embedded/bin/ruby /opt/gitlab/embedded/bin/gitlab-mon web -c /var/opt/gitlab/gitlab-monitor/gitlab-monitor.yml
0.2 7423 systemd+ mysqld
0.2 7403 systemd+ mysqld
0.2 6920 daemon /opt/quest/sbin/.vasd -p /var/opt/quest/vas/vasd/.vasd.pid
0.2 6919 daemon /opt/quest/sbin/.vasd -p /var/opt/quest/vas/vasd/.vasd.pid
============================================

Processes marked ‘D’:

root     11333  0.0  0.0   4372   352 ?        D    Jun26   0:00 svlogd -tt /var/log/gitlab/gitlab-monitor
root     11483  0.1  0.0   4396   620 ?        D    Jun26   1:04 tail --follow=name --retry /var/log/gitlab/sshd/current /var/log/gitlab/gitlab-shell/gitlab-shell.log /var/log/gitlab/gitlab-rails/sidekiq.log /var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2018-08-01-03-17-40.log /var/log/gitlab/gitlab-rails/production.log /var/log/gitlab/gitlab-rails/grpc.log /var/log/gitlab/gitlab-rails/production_json.log /var/log/gitlab/gitlab-rails/api_json.log /var/log/gitlab/gitlab-rails/application.log /var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2018-08-01-04-16-50.log /var/log/gitlab/gitlab-rails/githost.log /var/log/gitlab/gitlab-rails/repocheck.log /var/log/gitlab/gitlab-rails/sidekiq_exporter.log /var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2018-10-03-02-33-22.log /var/log/gitlab/gitlab-rails/integrations_json.log /var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2018-10-26-01-09-17.log /var/log/gitlab/redis/current /var/log/gitlab/redis/state /var/log/gitlab/postgresql/state /var/log/gitlab/postgresql/current /var/log/gitlab/logrotate/state /var/log/gitlab/logrotate/current /var/log/gitlab/unicorn/unicorn_stderr.log /var/log/gitlab/unicorn/state /var/log/gitlab/unicorn/unicorn_stdout.log /var/log/gitlab/unicorn/current /var/log/gitlab/sidekiq/current /var/log/gitlab/sidekiq/state /var/log/gitlab/gitlab-workhorse/current /var/log/gitlab/gitlab-workhorse/state /var/log/gitlab/nginx/error.log /var/log/gitlab/nginx/gitlab_access.log /var/log/gitlab/nginx/gitlab_error.log /var/log/gitlab/nginx/access.log /var/log/gitlab/nginx/current /var/log/gitlab/nginx/et number /var/log/gitlab/nginx/.gitlab_access.log.swp /var/log/gitlab/nginx/gitlab_access.log.1 /var/log/gitlab/gitaly/current /var/log/gitlab/gitaly/state /var/log/gitlab/node-exporter/state /var/log/gitlab/node-exporter/current /var/log/gitlab/gitlab-monitor/current /var/log/gitlab/gitlab-monitor/state /var/log/gitlab/redis-exporter/state /var/log/gitlab/redis-exporter/current /var/log/gitlab/prometheus/current /var/log/gitlab/prometheus/state /var/log/gitlab/postgres-exporter/current /var/log/gitlab/postgres-exporter/state /var/log/gitlab/alertmanager/current /var/log/gitlab/alertmanager/state
996      11622  0.0  0.0  51096  4456 ?        Ds   Jun26   0:03 postgres: autovacuum launcher process
996      11623  0.0  0.0  33724  2600 ?        Ds   Jun26   0:21 postgres: stats collector process
996      11643  0.4  0.0  61032 10472 ?        Ds   Jun26   3:55 postgres: gitlab-psql postgres [local] SELECT
996      11648  0.0  0.0  60712  8452 ?        Ds   Jun26   0:14 postgres: gitlab gitlabhq_production [local] SELECT
root     17523  0.0  0.0 208908 12868 ?        D    Jun26   0:57 python logship.py
polkitd  20887  1.7  0.0  37880  1904 ?        D    03:11   0:18 tar -cf - repositories db uploads.tar.gz builds.tar.gz artifacts.tar.gz pages.tar.gz lfs.tar.gz backup_information.yml
polkitd  28878  0.0  0.0  19420  1380 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/col-trading/presales/mobile-a.git
polkitd  28879  0.0  0.0  19760  1120 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/col-trading/presales/products-options.git
polkitd  28989  0.0  0.0 280244  1364 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/digital/ecommerce/webshop.git
polkitd  29015  0.0  0.0  53140  1368 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/ordercom-api.git
polkitd  29017  0.0  0.0  19452  1364 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-api.git
polkitd  29018  0.0  0.0  19352  1368 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-2fa-afm-client.git
polkitd  29019  0.0  0.0  19320  1116 ?        D    03:24   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/community/swagger-api/myoffers.git
polkitd  30402  0.0  0.0  19760  1120 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/col-trading/presales/products-options.git
polkitd  30403  0.0  0.0  19420  1384 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/col-trading/presales/mobile-a.git
polkitd  30409  0.0  0.0  53140  1364 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/ordercom-api.git
polkitd  30410  0.0  0.0  19452  1352 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-api.git
polkitd  30411  0.0  0.0  19452  1116 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-api.git
polkitd  30413  0.0  0.0  19352  1368 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-2fa-afm-client.git
polkitd  30424  0.0  0.0  19356  1384 ?        D    03:26   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/digital/ecommerce/common-handset-picker-client.git
polkitd  30440  0.0  0.0 280244  1368 ?        D    03:27   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/digital/ecommerce/webshop.git
polkitd  30459  0.0  0.0  19352  1360 ?        D    03:27   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-2fa-afm-client.git
polkitd  30466  0.0  0.0  19320  1352 ?        D    03:27   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/community/swagger-api/myoffers.git
polkitd  30485  0.0  0.0  19352  1120 ?        D    03:27   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/selfserve/consumercom/usertools/selfserve-client.git
996      30688  0.0  0.0  62204 10692 ?        Ds   03:27   0:00 postgres: gitlab gitlabhq_production [local] BIND
996      30802  0.0  0.0  61836 10168 ?        Ds   03:27   0:00 postgres: gitlab gitlabhq_production [local] BIND
polkitd  30928  0.0  0.0  20872  1120 ?        D    03:27   0:00 /opt/gitlab/embedded/bin/git upload-pack --stateless-rpc --advertise-refs /var/opt/gitlab/git-data/repositories/customer-engineering/digital/fire/myoffers-api.git
996      31114  0.0  0.0  60352  8308 ?        Ds   03:28   0:00 postgres: gitlab gitlabhq_production [local] PARSE
996      31115  0.0  0.0  60160  8248 ?        Ds   03:28   0:00 postgres: gitlab gitlabhq_production [local] BIND
polkitd  31206  0.0  0.0  19444  1240 ?        D    03:28   0:00 /opt/gitlab/embedded/bin/git --git-dir /var/opt/gitlab/git-data/repositories/customer-engineering/digital/fire/myoffers-api.git cat-file --batch-check
polkitd  31219  0.0  0.0  19876  1236 ?        D    03:28   0:00 /opt/gitlab/embedded/bin/git --git-dir /var/opt/gitlab/git-data/repositories/customer-engineering/mcso/mcso-core.git cat-file --batch-check
polkitd  31255  0.0  0.0  19296  1108 ?        D    03:28   0:00 /opt/gitlab/embedded/bin/git upload-pack /var/opt/gitlab/git-data/repositories/customer-engineering/digital/portal/aem-portal.git
root     31257  0.0  0.0  91392  4484 ?        Ds   03:28   0:00 sshd: git [priv]
polkitd  31263  0.0  0.0  19876  1240 ?        D    03:28   0:00 /opt/gitlab/embedded/bin/git --git-dir /var/opt/gitlab/git-data/repositories/customer-engineering/mcso/mcso-core.git cat-file --batch-check
polkitd  31264  0.0  0.0  19444  1240 ?        D    03:28   0:00 /opt/gitlab/embedded/bin/git --git-dir /var/opt/gitlab/git-data/repositories/customer-engineering/digital/fire/myoffers-api.git cat-file --batch-check
996      31519  0.0  0.0  57192  3660 ?        Ds   03:28   0:00 postgres: gitlab gitlabhq_production [local] startup

Server Load Average:

39.97 26.48 15.02 1/1243 31532

istat:

Linux 3.10.0-957.10.1.el7.x86_64 (blt04910036) 27/06/19 x86_64 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
6.59 0.00 2.37 2.37 0.00 88.67

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
fd0 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 42.59 42.59 0.00 42.59 0.00
sdb 0.10 0.95 0.03 0.11 0.00 0.00 74.51 0.00 0.84 0.72 0.87 0.53 0.01
sdc 0.00 0.33 0.26 0.43 0.00 0.02 85.70 0.00 0.87 0.85 0.87 0.42 0.03
sda 0.04 0.25 1.69 25.64 0.09 0.31 30.34 0.18 6.74 3.13 6.98 0.11 0.30
sdd 0.00 0.36 0.98 1.01 0.05 0.11 164.12 0.00 2.33 3.56 1.15 0.53 0.11
dm-0 0.00 0.00 0.38 0.02 0.03 0.00 130.93 0.00 2.66 2.28 8.93 0.80 0.03
dm-1 0.00 0.00 0.13 1.06 0.00 0.00 8.34 0.00 2.43 1.03 2.60 0.06 0.01
dm-2 0.00 0.00 0.05 0.30 0.00 0.02 105.85 0.00 1.26 3.48 0.93 1.02 0.04
dm-3 0.00 0.00 1.03 24.49 0.06 0.29 27.86 0.18 7.21 4.84 7.31 0.10 0.25
dm-4 0.00 0.00 0.25 0.69 0.01 0.00 22.51 0.00 0.75 1.55 0.46 0.40 0.04
dm-5 0.00 0.00 0.02 0.00 0.00 0.00 34.92 0.00 4.71 3.61 245.00 3.86 0.01
dm-6 0.00 0.00 0.01 0.36 0.00 0.00 15.62 0.00 0.96 2.07 0.92 0.29 0.01

NOTE: These stats are from the night when the housekeeping jobs like the backup job (GitLab backup) are running. However, we also encounter this situation right in the middle of the day when no jobs are running.