Upgrades failing

When I try to install the latest gitlab version, the update fails and it reports that I need to upgrade postgres.
When I run sudo gitlab-ctl pg-upgrade, it fails and the error in the log file is:
pg_dump: error: invalid column numbering in table “epic_user_mentions”

I can’t upgrade gitlab because pg-upgrade fails.
I can’t upgrade postgres because of the column numbering error.

I’m currently on GitLab Enterprise Edition v16.11

I finally solved my issue.

First I accessed the postgres database with psql:
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql -d gitlabhq_production

Then I ran a query against the epic_user_mentions table
select * from epic_user_mentions

This returned an error:
ERROR: postgres pg_dump missing attribute error

From this error I followed the directions of this stackoverflow answer:

and ran the following:
REINDEX INDEX pg_catalog.pg_attribute_relid_attnum_index;

I then reran the select query and the error was gone.

Now I was able to run the postgres upgrade:
gitlab-ctl pg-upgrade

and finally the gitlab update.

1 Like

The original error sounds to me like a previous set of database migrations (done by many upgrades) failed. Unfortunately I don’t know how to find out when table epic_user_mentions was changed, but there’s a good chance it happened on your last “succesful” upgrade, maybe someone can say something concrete if you tell us what version that was from.

That stackexchange answer says that recreating the index is only something you should do to salvage the data from the database, and that the primary cause of action should be to restore from the last good backup.

Your error is not exactly the same as the one in the question (you have invalid column numbering, the question mentions catalog is missing 2 attribute(s). I’m not familiar enough with databases to even know what “column numbering” is about, but I guess your database might be in a bad state where dump in the backups are only usable on a similarly broken database.

Features that use the affected table might fail in weird ways, and the next migration involving that table might fail.

1 Like

The error reported by gitlab was different than the error I received from the select against the table in psql. The first didn’t lead me anywhere, the second led me to that SE question. I’m guessing that the error confused gitlab and it returned something unrelated. Smarter heads can better answer that question.
The table was empty, so I hope that there won’t be any future repercussions on migrations. Time will tell.