Redis error after using for a while

After using GitLab EE (omnibus) 15.10 for a while in a testing environment, I get error 500 “We’re sorry. Something went wrong on our end.”. When that happens, in /var/log/gitlab/redis/current I get the following messages all the time:

2023-03-28_12:23:28.05427 12184:M 28 Mar 2023 09:23:28.053 * 1 changes in 900 seconds. Saving...
2023-03-28_12:23:28.05430 12184:M 28 Mar 2023 09:23:28.054 * Background saving started by pid 14667
2023-03-28_12:23:29.09368 14667:C 28 Mar 2023 09:23:29.093 # Write error saving DB on disk: No space left on device
2023-03-28_12:23:29.15996 12184:M 28 Mar 2023 09:23:29.159 # Background saving error
2023-03-28_12:23:34.08582 12184:M 28 Mar 2023 09:23:34.085 * 1 changes in 900 seconds. Saving...
2023-03-28_12:23:34.08584 12184:M 28 Mar 2023 09:23:34.085 * Background saving started by pid 14672
2023-03-28_12:23:35.12277 14672:C 28 Mar 2023 09:23:35.122 # Write error saving DB on disk: No space left on device
2023-03-28_12:23:35.19328 12184:M 28 Mar 2023 09:23:35.193 # Background saving error
2023-03-28_12:23:40.02042 12184:M 28 Mar 2023 09:23:40.020 * 1 changes in 900 seconds. Saving...
2023-03-28_12:23:40.02066 12184:M 28 Mar 2023 09:23:40.020 * Background saving started by pid 14676
2023-03-28_12:23:41.04726 14676:C 28 Mar 2023 09:23:41.047 # Write error saving DB on disk: No space left on device
2023-03-28_12:23:41.12660 12184:M 28 Mar 2023 09:23:41.126 # Background saving error

We don’t have disk space problems. Indeed, if I restart the service, everything works fine again.

Can it be related to this message on the same log that appears when starting the service?:

2023-03-28_12:29:38.56456 14897:M 28 Mar 2023 09:29:38.559 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

Thanks very much.

I can confirm that this is still happening after adding ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then rebooting. Now we are in production environment with version 15.10.1.

Any hint is appreciated. Thanks.

Setting vm.overcommit_memory =1 isn’t going to be a good option - although it depends how much RAM and swap your machine has. The options are:

0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit

0 will cause oom killer to kick in if it attempts to use more ram than is available. Option 2 will always check to ensure there is enough ram before bombing out if not - can be problematic, have tried that before on other servers, and can be a bit hit and miss if the server isn’t configured correctly. Option 1 just doesn’t really sound safe to me to overcommit irrespective of the amount of ram/swap available.

This link explains a bit from the redis side: https://redis.io/docs/get-started/faq/

The problem here is, that if the Redis dataset is 3GB, and you only have 2GB of ram, and no swap configured, then how is that going to work? It cannot use memory it doesn’t have. So I don’t understand how that would be useful, unless there is plenty of RAM/swap available for it to actually save that 3GB.

So the biggest question here is, how much CPU/RAM does your Gitlab VM have? It should have at least 4CPU and at least 8GB ram. And also, how much swap is configured? Eg: my Gitlab instance has 8gb ram and 16gb swap. For lower levels of ram, it’s best for swap to be 2 x ram.

The error from the log says no space left on device, but you can even get something similar to this if the partition has run out of inodes. You can use df -i to check if any partitions have zero inodes available.

No problem with inodes, I have checked it.

The VM had 6CPU and 4GB ram. As I suspected the ram was too low, I enlarged it to 6 GB after the last time the problem occurred (although the GitLab memory requirements for our environment of about only 10 users says that “4 GB RAM is the required minimum” in Installation system requirements | GitLab). Besides, I set swappiness to 10 as the same document says.

The swap size is 2 GB.

So, in our scenario, what would you recommend?. Should I add 2 GB more for the RAM?. What about the swap?. And should we leave vm.overcommit_memory in 0?.

Thanks!

Well, 4GB would be OK if you disable the stuff like grafana, prometheus, etc, otherwise you need more ram. Swap is too small in this instance, and should be 8GB (2 x ram).

If you don’t disable the memory hungry stuff, then you need 8GB ram, swap you can put then at 8GB or 16gb if problems still occur.

So far, we have been giving more power to our gitlab server. Actually it has 10 GB ram and 16 GB swap. We left vm.overcommit_memory with it’s default of 0. Besides, we disabled prometheus. Grafana was disabled by default. Although the server is more stable, the problem still occurs every 3 or 4 days.

We gave the server 2 GB ram more than the 8 recommended (completing 10), as there are some Gluster process synchronizing the gitlab data directory all the time.

Any other idea?. Thanks very much.

As a workaround, I will configure a cron to reboot de host every day. Any other hint is very welcome. Thanks very much.