We recently upgraded the OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)…
Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it’s own glibc
, so it is using the one from the OS.
ERROR: canceling statement due to statement timeout
STATEMENT:
SELECT relnamespace::regnamespace as schemaname,
relname as relname,
pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = ‘r’;
The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.
I checked the statement timeout set on the database.
gitlabhq_production=# show statement_timeout;
statement_timeout
-------------------
1min
(1 row)
We see these errors occasionaly. Sometimes they disappear on their own but sometimes I see these messages continuously and then the GUI goes down. We keep getting 502s. GitLab restarts or even container restarts don’t help. Only a server reboot helps.
Checked pg_stat_activity
and don’t see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.
Ran \d pg_class
to check whether the table uses any indexes and also to check the string column.
gitlabhq_production=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
---------------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
relallvisible | integer | not null
reltoastrelid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relpersistence | "char" | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relrowsecurity | boolean | not null
relforcerowsecurity | boolean | not null
relispopulated | boolean | not null
relreplident | "char" | not null
relfrozenxid | xid | not null
relminmxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
Did a reindex of all system catalog and all database tables but still seeing the timeout issue. Any ideas on what might have gone wrong and what can be done to fix this? GitLab CE v11.4 and RHEL 7.6