Gitlab HA is thoroughly miserable

Hi @nimbius and welcome to the Gitlab community.

I’m not sure how many users you have in total, but they do play a part in HA since you need to know how many servers to deal with it. For example load balancing, etc. Sending 100 users to one server for example, is going to be far quicker than sending 1000 users to one server. Anyway, that aside let me continue.

Let’s assume you just want HA to safeguard your data, that you don’t have a huge amount of users, and so sending them to a single server isn’t going to be a performance issue. What I write below is theory, but should generally work - it’s more or less a comparison to what I did utilising a LAMP setup. We would need 7 servers for this. Technically it would be possible to do it with 5, but providing that the 3 servers have a high enough spec to deal with the workload that we give it, then OK.

The first part, we would need 2 servers to act as load balancers. This will become more apparent as to why a little later on.

The second part would be to run gitlab on 2 servers but active/standby (in case of server failure). These 2 servers would mount /etc/gitlab, /opt/gitlab and /var/opt/gitlab via glusterfs-client to the other additional 3 servers. So 7 in total.

Alternatively, 2 servers for load balancing connections, and 3 servers running glusterfs and gitlab with locally mounted glusterfs for /etc/gitlab, /opt/gitlab, /var/opt/gitlab - but would require higher hardware requirements to deal with the IO of running gitlab AND replicating data between the 3 servers.

For GlusterFS you need 3 servers, for quorum, so it cannot be done with 2. Technically you can, but when one node fails, then you will have a problem and the other node is fine until you attempt to restart it. The GlusterFS services will not become active if there isn’t at least a second node available. Have tried it, it’s not happy to work like this so not worth even attempting.

But will assume the 7 server setup.

First Layer (LB) - 2 x LB, active/standby. VIP address will change between the two depending on what happens with the servers, eg: service failure, inaccessible for some reason, rebooted. Therefore connections will always get routed to the second layer.

Second Layer (GITLAB-LB) - 2 x servers with Gitlab installed, GlusterFS client mounting /etc/gitlab, /opt/gitlab and /var/opt/gitlab. Obviously with CPU/RAM requirements that meet your gitlab user count. These will also work in active/standby mode, therefirst the first layer (LB) will route to one of the active servers (GITLAB-LB). Here will also have a VIP IP like the First Layer.

Third Layer (GlusterFS) - 3 x servers replicating data, obviously 1Gbps network connection isn’t going to cut it, so 10Gbe will be needed here at least or perhaps fibre. You will have to replicate the gitlab repository data, as well as the PostgreSQL data. Because we are replicating whats under /var/opt/gitlab. /etc/gitlab and /opt/gitlab don’t see that many changes. Here also we have LB services with a VIP address as we need to use this to mount the GlusterFS partitions at the Second Layer where gitlab will be running. Whilst you could do this with fstab mounting to gluster01, gluster02, gluster03 (assuming that is their names), you would have a problem with mounting later if fstab is for gluster01 but it’s not available. You can edit it of course, but the solution with LB services simplifies, as the VIP IP address will float between the three machines. That way you can have, eg: gluster-vip in fstab, and this will mount irrespective of which server has the VIP IP address. Maybe there is a better way, but this is theory anyway. I have tested it this way with my LAMP setup.

In theory it’s possible, and providing the hardware is up to scratch, then it should be fine performance-wise. So communication wise:

Users --> First Layer (LB) --> Second Layer (GITLAB-LB) --> Third Layer (GlusterFS Data replication)

First Layer - HTTPS and SSH redirected to Second Layer
Second Layer - Gitlab services HTTPS/SSH just like a single server setup
Third Layer - Your data is replicated so that one of the two servers in the second layer can mount partitions and continue working in the event of service/hardware failure.

You can redirect for example HTTPS and SSH which would generally enough for browsing the Gitlab WEB UI, as well as making commits via HTTPS or SSH.

But again, all theory, depends on what hardware is available, and would seriously need testing to see what performance would be like. For PostgreSQL certain Gluster settings need to be taken into account to ensure we don’t get bottlenecks.

So if you are not worried about the amount of users and want the HA equivalent of utilising one Gitlab server to service your users, that could potentially do it.