We are trying to setup DR for our Gitlab instance, and therefore we will be using Nginx for routing traffic, but there is a requirement that we should be able to mirror the pipelines from one instance on the another , the idea behind this is that if one of the instance goes down, we would like to route the traffic to a single instance and we would then need the pipelines of compromised Gitlab instance as well there.
You don’t explain much about how you are trying to achieve this. I can only guess, that you are using a single Gitlab instance that isn’t part of a High Availability (HA) setup, and you are using backup/restore or something on the second server that is just sitting there and not being used? If it is not like this then please explain in much more detail.
Since .gitlab-ci.yml is in the repositories, then the CI/CD is configured, but I expect what you mean about pipelines is the data behind it when pipelines were ran and it’s missing on the second server because you are not using clustered PostgreSQL database or something (eg: you have two separate PostgreSQL databases for your instances).
But, from what you say, if you want to have data replicated live between multiple servers, then you should be looking at configuring HA properly. In which case, please refer to the Gitlab documentation that tells you how to do this: Reference architectures | GitLab
If you are trying to do it cheap and easy without a huge amount of servers, then you are going to be limited to what you can do. And you won’t be able to live-sync the data which is what I believe you are trying to achieve. You will need to go full blown HA which requires a lot more servers to do that.
Thanks for your insightful response. We have two Gitlab instances, sitting on two different data centers, so basically geo redundant. The traffic to these Gitlab is routed with the help of Nginx basically load balancing. So to answer your question ,we do not have a full HA, like primary secondary in each DC but single instances in two different DC. As of now no backup and restore established between these two servers, so basically they are two individual servers . So our requirement is , lets say one of the instance goes down due to a disaster, then we would like to continue with the pipelines on a single instance, here by pipeline I mean obviously the running pipelines , with data awaiting next action. So it is just on need basis that we want to replicate the data from this failed instance to the running instance.
Idea about clustered PostgreSQL database is very new to me, but as of now we have individual postgre DB. Is there a way to use Disaster recovery Geo.
With the way you’ve got it set up, no you cannot really do that. You would need to go full HA and that gets complicated, requiring a lot more servers.
About the only real way you could do it with your existing setup, requires a lot more effort, and about the only real solution is active/standby, with pretty much taking backups every so often and then immediately dumping them on the second server. The second server could be scripted to restore that backup. However, you will need to accept with such a solution that you will not have a complete up to the minute or second data copy like you would have with full HA.
You could schedule backups to run every 1 hour, so at worse case you would lose perhaps an hour of data, repositories can be pushed again if commits are behind. Issues will be more problematic, the same with pipeline data and so this will be missing at max up to an hour.
You will never be able to do full up-to-the-minute or up-to-the second data sync without going full HA. It’s pretty much near impossible to attempt to do that with your existing setup.
If you had for example VMware vSphere with the appropriate license in both data centres, you could use the functionality that offers to replicate the active instance to the second data centre. It would be at least pretty much up-to-the-minute or up-to-the-second copy of the VM depending on bandwidth between data centres. But that’s outside of the scope of Gitlab itself, and if you don’t have that, then it’s not an option. But it would be possible with a single server in that instance, just done all via vSphere.
Thanks iWalker, I too think from my analysis the solution you proposed. My team members are proposibg having an external postgre to connect to both of our instances, and in that case I have checked that there will 3 postgre , 2 sidekiq and 2 gitlabrails server required ? Am I correct ?
I think you should use gitlab geo : basically gitlab geo requires to have a postgres streaming replication between the two sites (geo uses the read-only DB). Then on the geo side there is a service that will fetch all repo changes via https from the primary (in near real time) and it applies it via the normal gitaly mechanism on the geo site (so a bit similar to logical replication, i.e. it replays the git changes)