Hello GitLab Forum,
as grove pointed out in the last thread I created, my topic name was not pointing to the right people. So I’m basically posting the same post again but with a different name and some additional information.
We are running a docker installation of GitLab. It is a GitLab Omnibus installation on version 12.8.2-ee, running in a Docker container for our internal development.
Sadly said Docker - Omnibus GitLab hasn’t seen a maintenance in a long time, so it hasn’t been upgraded in a long time.
Recently my team was assigned the responsibility of said GitLab Omnibus instance.
One of the first things we wanted to do is upgrade the Omnibus container but the problem is that the upgrades from version 12 to 13 are failing every time.
The container is currently running on the 12.8.2-ee.0 Docker Image.
We tried upgrading to 12.10.14-ee.0 and afterwards tried upgrading from there as recommended in the GitLab Docs upgrade path for the enterprise edition.
The problem after upgrading from 12.10.14-ee.0 or 12.10.13-ee.0 to 13.0.14-ee.0 is always the same:
The users are unable to pull or push code and the Web Editor vanishes from the UI.
GitLab 12.8.2-ee (72ed810d281)
GitLab Shell 11.0.0
GitLab Workhorse v8.21.0
GitLab API v4
Does anyone have an idea what we are doing wrong? Or what I could try in order to upgrade the Container to a newer version? Sadly my experience with GitLab administration is almost nonexistent.
Reproduction of the pulling issue:
I was able to reproduce the error when pushing:
$ git push
info: detecting host provider for 'http://InternalGitLabUrl.com/'...
fatal: unable to access 'http://InternalGitLabUrl.com/group/subgroup/project/subproject.git/': The requested URL returned error: 500
Cloning seems to work, so I assume that pulling should work too.
What I recognized during upgrading is that postgre SQL is not updated by the version change of the docker image. Could the version that maybe is to ‘new’ of the postgre SQL cause the migrations to fail/not start at all?
How long did you wait after upgrading to 12.10.14? Usually after an upgrade, there are background migrations that take place, so you need to be 100% sure that these have finished before you start the next upgrade. I personally don’t use docker, but the procedures would be pretty much the same anyway.
Also, it might be worthwhile to adapt the upgrade procedure a little. For example, this is how it looks in the docs here: Upgrading GitLab | GitLab
So, once you get to 12.10.14 and everything is working, make sure the background migrations are finished. In the upgrade link I provided, the info there should help explain how to do that. If not visible in the web interface, you can use the ruby commands and do it for the console (you might have to docker exec into the container though).
Then adapt the upgrade procedure, by going to the first and last of the point releases. So since 13.0.14 is your next step, do 13.0.0 and then 13.0.14. Then go to 13.1.0 and then 13.1.11. And so on. To be honest, I’ve done all releases as soon as the appeared, so my install has gone through all 13.1.x, 13.2.x, 13.3.x, and so on. And follow this right up to the latest 15.x. release. I find that this will more likely be less problematic. It could well be that the upgrade guide does skip something that may cause the problems.
As an aside, when I did my first upgrade when I got stuck on 12.9.3 I went through every single minor point release, so imagine doing 12.9.4, 12.9.5, 12.9.6 right up to at that time one of the 13.x releases - over 70 on total. And took a while, as I had to wait for background migrations to finish before continuing the next release.
thanks for your reply. I tried upgrading our testing environment multiple times to 12.10.14 and then I tried upgrading as recommended in the documentation. At some of said trials for the upgrade I waited for several days in order to let background migrations finish. The problem with that is, that if I query the psql Database for running or pending background migrations the returned value is always ‘0’ even right after restarting the container with a newer image, when the migrations should take place. So I sadly don’t know if the upgrading process is broken or if just the migration status is returned wrongly from the database.
Do you maybe know other ways or queries to find out more about the background migration status? Or how to maybe start them manually? I think that maybe the already pretty new version of the psql component could make GitLab not start the background migrations?
We also were already thinking about dumping the data of our GitLab Instance while on 12.0.14, while broken and deploying a helm version to a Kubernetes Cluster with the same version, then import the data of the broken docker GitLab there and try the official upgrade path for the Helm version. Do you think that this could be a possible workaround?
It has been known for other people to experience problems with the upgrade when following the path, this is why I suggested doing some extra versions to minimise it. You can also check migration status like this:
that will show every single migration applied, and you should see a status of
up if it has finished properly. As mentioned before, I don’t use docker or kubernetes/openshift for Gitlab. I prefer a normal omnibus installation on a VM because I feel it’s far easier to manage and maintain. But that’s just my personal preference.
Also, restores can only be done to the same type of install. So, omnibus to omnibus, docker to docker, source install to source install. You cannot switch between them. Therefore a backup on omnibus cannot be restored on docker, or vice-versa.
I just booted up my restetted testing GitLab and executed the command that you recommended in the docker container. Do you have some additional information about the command? The status of every migration is the same, it says ‘up’. What does up mean?
I might try to do more steps between the versions. Would you say I should do some of the minor patches? Or should upgrading from 12.8.2 to 12.9.x and then to 12.10.14 be sufficient?
If it is up, that means the migration has finished. If it had a different status, that would mean either the migration hasn’t completed or it has failed.
From what I read, your upgrade to 12.10.14 was successful yes? If so then that is fine. The majority of problems came out in 13.x or 14.x which is why I suggested at this point to do more point upgrades. So the very first 13.0.0 and last 13.0.x release. And the same for all 13.x releases, same for 14.x releases right up until latest 15.x.
Bear in mind, when I did my 12.9.3 upgrade, I wasn’t sure of the upgrade path, and so did every single point release. But if you imaging that every 13.x.x release can have maybe 10 point releases (13.0.1 - 13.0.10) then assuming 13.0 to 13.9 x 10 is 90 upgrades if I did all of them, 13.0.0, 13.0.1, 13.0.2… and so on. Whilst that takes a serious amount of time 2 - 3 days for me, at least it was more stable.
But I think doing the method I said, so first 13.0.x release, last 13.0.x release, and do that for every other release after this, 13.1, 13.2, 13.3, etc you should be fine.
Thanks for the tip with the rake, I just upgraded the GitLab instance to 12.10.14 again and queried the migration status. Every task in the list has the status ‘up’ directly after upgrading, is this normal?
Is it maybe possible that not every migration job is started?
Edit: I might try every single point release as well, maybe it helps.
When I did my upgrades I used the background migration check commands. During 12.x and 13.x this would normally show a value if migrations were being ran. Once it got to zero it was safe to proceed. Not all releases have migrations. Some migrations take longer to process than others (hence why they state wait 24 hours) but if you are sure they have all run, and it’s showing zero, you can start the next upgrade. At the time the Gitlab docs stated that most migrations were done in the 13.x.0 releases. However, if you compare to the upgrade path now, it seems to have digressed from that, and it seems major migrations are done in the release numbers you see on the path - basically an upgrade you should never miss out.
Also, the zero-downtime method - although in reality on a single Gitlab server when the services restart there are a couple of minutes inaccessibility this isn’t exactly zero-downtime, the upgrade is to do every single release. Therefore absolutely every point release from 12.10.14 up to the latest. Which would be hundreds of upgrades as mentioned. But, as I said, 99% sure that unlikely to experience problems that way. That doesn’t mean something might not crop up, but it has far more chance of working.
Hello, I tried upgrading with more Versions in between. I went from 12.8.2 over every minor release to 12.10.14 and then upgraded to 13.0.0.
The result is the same: 1. There are no background migrations running ever, when checking the status with gitlab rake every migration there is ‘up’.
2. Until 12.10.14 everything works normally, after upgrading to 13.0.0 the WebEditor vanishes and pushing is not possible anymore.
Is there anything else I could try?
Error message when trying to push, how can I get the exact Message behind the 500 Error Code? Maybe that would help finding the issue?
$ git push
info: detecting host provider for 'http://kubespt-gitlab-testing.intern.arwinet.com/'...
fatal: unable to access 'http://kubespt-gitlab-testing.intern.arwinet.com/kubernative/kubeops/kubeops-doku.git/': The requested URL returned error: 500
I was trying some more stuff from the docs and I found that the postgres sql query for getting the amount of queued migrations isn’t even able to be performed in my container. Could this somehow be related to my problem? Or is it because the docs have queries documented that are not applicable for version 12?
gitlab-rails runner -e production 'puts Gitlab::Database::BackgroundMigration::BatchedMigration.queued.count'
Please specify a valid ruby command or the path of a script to run.
Run 'rails runner -h' for help.
uninitialized constant Gitlab::Database::BackgroundMigration
Did you mean? Gitlab::BackgroundMigration
I am performing the officially supported chain for Gitlab updates and that command started working for version 14.0.12 and was failing for 13.12.15