Artifacts storage is growing out of bounds


#1

Hi,

we are having some repos with huge artifact sizes:

select
  pg_size_pretty(sum(ci_job_artifacts.size)),
  ci_job_artifacts.file_type,
  count(ci_job_artifacts.file_type),
  to_char(ci_job_artifacts.created_at, 'YYYY-MM') as year,
  projects.id
from
  namespaces,
  projects,
  ci_job_artifacts
where
  namespaces.id = projects.namespace_id
  and projects.id = ci_job_artifacts.project_id
  and ci_job_artifacts.size is not null
group by ci_job_artifacts.file_type, year, projects.id
order by sum(ci_job_artifacts.size) DESC
limit 10;

Result:

pg_size_pretty file_type count year id
72 GB 3 96641 2018-11 537
28 GB 1 257 2018-11 616
27 GB 3 20881 2018-10 537
23 GB 3 17499 2018-09 537
19 GB 3 15539 2018-07 537
17 GB 3 17675 2018-04 537
16 GB 3 15936 2018-05 537
16 GB 1 1358 2018-11 537
14 GB 3 12466 2018-06 537
14 GB 3 14568 2018-03 537

All these jobs have lifetime limits, still, the logs seem to stay(file_type=3 seems to be the log output of the jobs)

So I wrote a script that uses the API to delete everything with https://git.dev.zgrp.net/api/v4/projects/{project}/jobs/{job}/erase
that takes ages and when the output of the query above and what I see in the user interface does not match.
It would probably not work very well with the projects above as they have/need MANY jobs.
Is there a better way? the script I wrote so far isn’t even applicable yet for them, as they told me that time based isn’t enough, the need logs from tagged commits forever.

This is Gitlab 11.4