Analyse disk usage

After Updating to gitlab-ce 14.0.5 (from 13.9.1 via 13.12.6) now my partition /var/opt/gitlab is full every day. I resized the disks (so 20% is free) but the next day disk is full again. How can I determine the objects that produces theses growth? I tried to find it with “du -sk *” but I’v seen that this is not helpfull below /var/opt/gitlab/git-data/repositorys/@hashed/ .

Any ideas?

Hi,

I don’t think it has anything to do with the version, since I am on 14.1.x and I don’t have this problem. The easiest way is this:

cd /var/opt/gitlab
du -sh * | grep G

will at least enable you to filter by gigabytes to concentrate on what directories have the largest amount of data. If it is not the backups directory consuming space, then it will most likely be your repository data. In which case you will probably need to allocate a lot more disk space to your server, if there are a lot of commits being made each day with large amounts of data.

But only you will be able to tell what activity is happening on your server, whether it is because of commits and large repo data or not.

That should at least help you get started to filter down each directory and go from there.

Thanks your your hints. I compared the usage as mentioned. At the moment it seems only “/var/opt/gitlab/git-data” (and below) is growing so fast.

postgresql and gitlab-rails are large too, but there is now sigificant growth here.

I asked my colleagues but we could’nt find a reason for this behavior, so I have to find out more details about what is going on there.

1 Like

I’m not very familar to git, but I just recognized that growth of 6% happend on a time based schedule. So there was a growth at 12:00 AM. Crontab is empty. Are there any mechanism in git which are scheduled regulary?

Well, I did this to check for all users that Gitlab could potentiall use:

for gituser in `cat /etc/passwd | grep -i git | cut -f1 -d ":"` ; do crontab -l -u $gituser ; done
no crontab for gitlab-www
no crontab for git
no crontab for gitlab-redis
no crontab for gitlab-psql
no crontab for gitlab-prometheus

no cronjobs, so nothing to do with cron. Not sure if sidekiq would do something or not at that particular time of day.

You can check sidekiq here:

https://gitlab.example.com/admin/background_jobs

replace gitlab.example.com with the name of your server. There is a cron listed here also, so it could be something there.

In Admin Area → Background Jobs → Cron I have seen a lot of jobs.
On top it says " GitLab uses Sidekiq to process background jobs".
I’m not sure wether “not-admin-users” can change this? Where do this jobs usually come from?

Yep, those cron jobs are required by gitlab, and no there isn’t a possibility for non-admins to create, so perhaps Gitlab is doing something at 00:00. I don’t see such disk usage problems on mine, so not sure what your Gitlab instance is doing.

1 Like

Sounds to me like someone is building something huge in a CI-job and archiving that.
Do you use Gitlab-CI?

1 Like

We just recognized a huge repository. Hope this was the reason for our problems.

I’m going to check this too.

Admin-Area → Projects → Sort by “largest repository” was very usefull.

One thing I don’t understand:
From web-gui view I recognied a project with 4,3 GB disk usage. Then I take a look into git-data. Her it only consumes 62 MB!!!?

grafik
grafik

So what das “Storage: 4.3 GB” mean?

Since deleting the huge repository everything seems to be ok. I don’t understand why it is fixed now, but at the moment the daily growth about 10% seems to be stopped. Thanks for your assistance @iwalker and @OIiM .

It’s just what I guessed: Someone was accumulating build artifacts through CI, probably by setting something to expire_in: never which runs on every commit.

1 Like