How do I purge artifacts to reduce storage space?

Version: gitlab.com

My project has 45mb of files but uses 21 GB of storage. I would like to clean that up.

Actions taken: According to my admin the consumption is coming from artifacts so I’ve deleted all pipelines. It DIDN’T reduce any storage. What should I do?

You can try this:

2 Likes

Thanks a lot @alexk

@dvianna, Tried it? Did it work?

Hey Alex, will try soon, I will let you know!
Thanks for the followup

After reading the instructions and running the command, I noticed that I deleted the jobs. Which means I lost all the job IDs from 1 year ago until now. So I can’t use these scripts as it they depend on knowing the job IDs

Hey there,
is this any help?

You could adopt the script from above with [1]:

curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/1/jobs/1/artifacts"

[Edit]:
Could you simply brute force the job Ids? Its just an incrementing number…
Maybe you first want to check that you don’t delete deployments nor releases by cross checking the commit hashes [2, 3]

Regards

[1] https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
[2] https://docs.gitlab.com/ee/api/deployments.html#list-project-deployments
[2] https://docs.gitlab.com/ee/api/releases/#list-releases

So this works:

# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab

project_id="xx"

# token, find it here: https://gitlab.com/profile/personal_access_tokens

token="-yy"

server="gitlab.com"

job_ids=(1)

for job_id in ${job_ids[@]};

do

 URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"

 echo "$URL"

 echo Job ID being deleted is "$job_id"

 curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"

 echo "\n"

done

But I did some calculations and brute-force will be impossible.
I checked my first pipeline for job Id to the last one, there 32.000 possibilities

Is there a way to delete all jobs or to retrieve all jobs that ever existed so I can dump into that loop? I rather not start a new repository settings now since there are lot of things involved

I’ve tried these but it didn’t work:

curl --header "PRIVATE-TOKEN: -zzz" "https://gitlab.com/api/v4/projects/projectid/pipelines"

curl --header "PRIVATE-TOKEN: zzz" 'https://gitlab.com/api/v4/projects/projectid/pipelines/6/bridges?scope[]=pending&scope[]=failed'

curl --globoff --header "PRIVATE-TOKEN: zzz" "https://gitlab.com/api/v4/projectid/jobs?scope[]=created&scope[]=failed"

I made it work with this ugly brute force:

    # project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
    project_id="xxxxxx" #
    
    # token, find it here: https://gitlab.com/profile/personal_access_tokens
    token="yyyyy"
    server="gitlab.com"
    
    
    # Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.
    
    # https://stackoverflow.com/questions/52609966/for-loop-over-sequence-of-large-numbers-in-bash
    for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
    echo "$job_id"
    
    echo Job ID being deleted is "$job_id"
    
    curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
    echo -en '\n'
    echo -en '\n'
    done

This approach is super slow, the lange between first job 1 year ago and now it’s super big, I let the script running for a few days then it crashed.

Question: How do I retrieve the jobs when they were deleted in the UI?