Error upgrading postgresql 12.12 -> 13.8

I’m trying to upgrade to gitlab-ce 15.11 today and getting an error with pg-upgrade. This is an attempt to run it manually outside of the package update, but showing the same results:

# gitlab-ctl pg-upgrade
Checking for an omnibus managed postgresql: OK
Checking if postgresql['version'] is set: OK
Checking if we already upgraded: NOT OK
Checking for a newer version of PostgreSQL to install
Upgrading PostgreSQL to 13.8
Checking if disk for directory /var/opt/gitlab/postgresql/data has enough free space for PostgreSQL upgrade: OK
Checking if PostgreSQL bin files are symlinked to the expected location: OK
Waiting 30 seconds to ensure tasks complete before PostgreSQL upgrade.
See https://docs.gitlab.com/omnibus/settings/database.html#upgrade-packaged-postgresql-server for details
If you do not want to upgrade the PostgreSQL server at this time, enter Ctrl-C and see the documentation for details

Please hit Ctrl-C now if you want to cancel the operation.
Toggling deploy page:cp /opt/gitlab/embedded/service/gitlab-rails/public/deploy.html /opt/gitlab/embedded/service/gitlab-rails/public/index.html
Toggling deploy page: OK
Toggling services:ok: down: gitaly: 0s, normally up
ok: down: gitlab-kas: 0s, normally up
ok: down: grafana: 0s, normally up
ok: down: logrotate: 1s, normally up
ok: down: registry: 0s, normally up
ok: down: sidekiq: 0s, normally up
Toggling services: OK
Running stop on postgresql:ok: down: postgresql: 1s, normally up
Running stop on postgresql: OK
Symlink correct version of binaries: OK
Creating temporary data directory:Error creating new directory: /var/opt/gitlab/postgresql/data.13
STDOUT: 
STDERR: su: Permission denied
Creating temporary data directory: NOT OK
== Fatal error ==
Please check the output
== Reverting ==
ok: down: postgresql: 2s, normally up
Symlink correct version of binaries: OK
ok: run: postgresql: (pid 1870773) 1s
== Reverted ==
== Reverted to 12.12. Please check output for what went wrong ==
Toggling deploy page:rm -f /opt/gitlab/embedded/service/gitlab-rails/public/index.html
Toggling deploy page: OK
Toggling services:ok: run: gitaly: (pid 1870785) 0s
ok: run: gitlab-kas: (pid 1870791) 1s
ok: run: grafana: (pid 1870815) 0s
ok: run: logrotate: (pid 1870826) 0s
ok: run: registry: (pid 1870832) 1s
ok: run: sidekiq: (pid 1870842) 0s
Toggling services: OK

The error is pretty clear that it is unable to create the temp directory /var/opt/gitlab/postgresql/data.13, but I’m not seeing why. The parent dir, /var/opt/gitlab/postgresql/, is owned by the gitlab-psql user with mode 755. There isn’t a preexisting dir data.13 in there that is owned by another user. I’m not finding any other logs that will give me any more details on the problem.

The upgrade to gitlab-ce itself seems to have otherwise been successful, and everything is running fine without the postgresql upgrade. It seems safe to pin it at 12 for now, but I would like to be able to stay as current as possible. Anybody have suggestions for further debugging, or steps to fix or workaround this problem?

I’m installing gitlab-ce from official gitlab.com repos. Running on Debian 11, also fully patched and up to date.

Thanks.
Seth

Hi @sethgali :wave:,

The upgrade to PostgreSQL 13.8 as part of 15.11 will be skipped in any of the following cases:

  • You are running the database in high availability mode with Patroni.
  • Your database nodes are part of a GitLab Geo configuration.
  • You have specifically opted out.
  • You have set postgresql['version'] = 12 in your gitlab.rb file.

Could you confirm that none of these conditions apply to you?

If not, we can attempt to create the directory manually, set the correct ownership and permissions, and then retry the pg-upgrade.

sudo mkdir /var/opt/gitlab/postgresql/data.13
sudo chown gitlab-psql /var/opt/gitlab/postgresql/data.13
sudo chmod 755 /var/opt/gitlab/postgresql/data.13
sudo gitlab-ctl pg-upgrade

Give that a try and let us know how it goes.

For additional information, refer to: https://docs.gitlab.com/omnibus/settings/database.html#upgrade-packaged-postgresql-server

Hi There,

I’m having exactly the same issue as @sethgali .

  • I do not run a HA cluster
  • I do not run a GitLab Geo conf
  • I have not created /etc/gitlab/disable-postgresql-upgrade
  • I have not pinned the PG version in gitlab.rb

I created the /var/opt/gitlab/postgresql/data.13 by myself as you recommended and ensured gitlab-pgsql has access rights. Verified with sudo -u gitlab-psql touch /var/opt/gitlab/postgresql/data.13/tmp wich did work. But gitlab-ctl pg-upgrade still fails with the same error as for @sethgali .

Any idea?

Apologies for not responding sooner, I missed the notification. None of the suggested conditions apply to me. I have a single standalone node as my Gitlab host, no HA or clustering enabled. I only opted out after the upgrade failed, so that future updates would complete successfully. I did try previously manually creating the temp directory and fixing permissions, and that failed with the same error, but neglected to report that in my original post. I disabled the opt-out and tried again today, having just installed the 15.11.2 update, and still came back with the same error: su: Permission denied. It feels like there is an internal check in the update script that is failing incorrectly, but I don’t have a good way to diagnose that specifically.

What user are you running the commands as? I tend to switch to the root user, never use sudo. I find it strange you are getting the su error, since this would hint that you are attempting to upgrade, but as a user that doesn’t have enough privileges.

Please clarify more clearly if that is not the case.

I’ve been running all of these commands as root. I’m also finding the su error strange. I think the script is maybe trying to su as the gitlab-psql user, but even so, the directory it wants to write to should have sufficient permissions. Even the parent directory /var/opt/gitlab/postgresql/ is owned by the gitlab-psql user.

Just to confirm permissions, I’ve checked my install. For /var/opt/gitlab/postgresql:

drwxr-xr-x  4 gitlab-psql       root       4.0K May  5 15:53 postgresql

So user is gitlab-psql and group is root. However, the contents are gitlab-psql for both user and group. For example:

drwx------ 19 gitlab-psql gitlab-psql 4.0K May  5 15:53 data

Does yours match that scenario?

Can you also provide a full directory listing for /var/opt/gitlab/postgresql?

Exactly the same.

# ls -la /var/opt/gitlab/postgresql
total 48
drwxr-xr-x  7 gitlab-psql root        4096 May  5 12:05 .
drwxr-xr-x 24 root        root        4096 May  5 12:05 ..
-rwx------  1 gitlab-psql gitlab-psql  795 Jan 22  2021 analyze_new_cluster.sh
drwx------ 19 gitlab-psql gitlab-psql 4096 May  5 12:05 data
drwx------ 19 gitlab-psql gitlab-psql 4096 Apr 27  2020 data.10
drwx------ 19 gitlab-psql gitlab-psql 4096 Jan 22  2021 data.11
drwxr-xr-x  2 gitlab-psql gitlab-psql 4096 May  5 11:38 data.13
drwx------ 19 gitlab-psql root        4096 Jul  5  2019 data.9.6
-rwx------  1 gitlab-psql gitlab-psql   52 Jan 22  2021 delete_old_cluster.sh
-rw-------  1 gitlab-psql root          52 Aug 10  2017 .profile
srwxrwxrwx  1 gitlab-psql gitlab-psql    0 May  5 13:59 .s.PGSQL.5432
-rw-------  1 gitlab-psql gitlab-psql   83 May  5 13:59 .s.PGSQL.5432.lock
-rw-r--r--  1 root        root          28 Dec  2 10:25 VERSION

Edit: Not exactly, but gitlab-psql user still has rwx on the temp and parent directories.

OK, I see old data directories there from previous upgrades, as well as a data.13. Assuming you are at version 12 still, I would delete data.10, data.11 and data.9.6.

I would then check/verify the contents of data.13 and possibly even delete that completely. Also, prior to this, double-check with:

gitlab-psql --version

after this attempt the upgrade again.

Got rid of all the old temp dirs and tried the upgrade again both with data.13 there with correct permissions and gone. Still get the same error.

# gitlab-psql --version
psql (PostgreSQL) 12.12

Which should be the latest v12 schema.

And if you try by explicitly writing:

gitlab-ctl pg-upgrade -V 13

does it also fail?

Also one more idea:

chown gitlab-psql:gitlab-psql /var/opt/gitlab/postgresql
chmod g+s /var/opt/gitlab/postgresql

make sure data.13 doesn’t exist, and then try the upgrade again. What we are doing here, is changing the group from root to gitlab-psql for the postgresql directory, and then we are setting the group sticky bit, for all directories/files created underneath to be gitlab-psql instead of root. Just in case something within the code is thinking it should be this than root.

It’s easy enough to revert if it doesn’t work.

chmod g-s /var/opt/gitlab/postgresql
chown gitlab-psql:root /var/opt/gitlab/postgresql

No luck setting the group sticky bit. Still getting the same error.

Sounds like to me there is an issue in the Gitlab code somewhere. My upgrades were done well before this version - I believe I was at the version that new Gitlab installs were with 13.x when I manually did my upgrades and they worked fine then.

I would suggest opening an issue here for the devs to take a look: Issues · GitLab.org / GitLab · GitLab

I’ll try tomorrow to make a test server with Gitlab and a 12.x Postgres and then upgrade it through to 15.11 and see if I can replicate the problem once at 15.11 and I attempt to upgrade Postgres. If I do manage to replicate and then figure out a way around it, I’ll post back here. I’ll also post back if I fail at it too.

We’ve been upgrading continuously on this host since installation in 2017. I’m not sure what version that would have been originally, but we haven’t had any problems until now. I’ll open an issue and see if the devs have any ideas. Thanks for the help.

Missed this one earlier. Explicitly requesting -V 13 still fails.

Link to Issue for reference: https://gitlab.com/gitlab-org/gitlab/-/issues/410059

[RESOLVED] I posted this in the issue linked above, but wanted to share here for anybody else who might come across it. A colleague helped me debug this today and we determined that the problem lay in the pam stack configuration on my host. /etc/pam.d/su includes /etc/pam.d/common-account (which I think is standard for debian), which we have including account required pam_access.so. The pam_access module reads /etc/security/access.conf to determine which users and groups have permission to login and from where. Per our standard configuration, we have -:ALL EXCEPT root serviceacct (servicegroup): ALL at the end to prevent non-admin users from logging in. I had previously added + : git : ALL to this file, but since that last line prevented all other accounts, I also had to add + : gitlab-psql : ALL above it to explicitly allow that account even to just get a local shell. Once I did this, I was able to manually su as the gitlab-psql user to test it, and finally successfully able to run gitlab-ctl pg-upgrade to upgrade the database. Debugging pam is fun. Thanks to all who may have taken a look at this.

3 Likes

Hello @sethgali , Were you runing Gitlab on Linux box? Which distro and version? if you can share please.

I mentioned this in my original post. I’m running on fully patched Debian 11. Currently up to Gitlab 16.2.3 without further problems.

1 Like