Remove all artifact (no expire options)

Hi all,

How do I clean all artefacts (for each test pipeline, without an “expire_in” option) that takes up a lot of space on my personal server?

Looking at the issues page, this appears to be a feature that has been worked on and is waiting for one more approval before being merged:

Until then, you would have to go to the folder on the server and delete it manually

If you have hundreds of builds doing it manually is not practical. You can use the rest api with curl, and your favorite scripting language

#!/bin/sh
project_id=456
token=secret
server=myserver
start_job=2
end_job=8
for job_id in $(seq $start_job $end_job)
do 
   curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
done

2 Likes

Thanks for this snippet. I tried it out and got a tonne of {"error": "404 not found"}. I’m hoping that’s from jobs which didn’t have artifacts.

After a little poking, I found that this snippet worked for me on macOS

#!/bin/bash
project_id="0000000"
token="teenagemutantninjaturtles"
server="gitlab.com"
start_job=30592507
end_job=30626126

for job_id in $(jot - $start_job $end_job)
do 
	URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
	echo "$URL"
	curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
	echo "\n"
done
3 Likes

I’ve made another version (in bash) the explicitly list your own jobs (useful if you use Gitlab-CI SaaS) using JavaScript browser console.

http://blog.fgribreau.com/2018/01/remove-gitlab-ci-artifacts-in-batch.html

1 Like

Thanks for the hints. Your job id loop can be slow. It is better to query the job list first. I am using this python script now:

import requests
project = "my-project"
token = "mytoken"
response = requests.get("https://server/api/v4/projects/%s/jobs?private_token=%s" % (project,token))
response.raise_for_status()
for job in response.json():
    if 'artifacts_file' in job:
        print("Erasing %d..." % job['id'])
        requests.post("https://server/api/v4/projects/%s/jobs/%d/erase?private_token=%s" % (project,job['id'],token))

Thanks to everyone here for their tidbits!

I’ve written a bash script that not only automates identifying which jobs have artifacts but it works with large numbers of jobs as well (GitLab API has a 100 item page limit). Only requirements are bash, curl, and jq. I’ve tested it on MacOS, CentOS, and Ubuntu.

To finish things out, I wrote a blog post discussing this issue and the proper method for handling artifact expiration. I’ve included my script as well for deleting already existing artifacts.

Powershell script for remove artifacts:

$project_id="???"
$token="???"
$server=“gitlab.com

for($page=2; $page -lt 50; $page++)
{
url = "https://{server}/api/v4/projects/{project_id}/jobs?scope[]=success&scope[]=manual&per_page=100&page={page}"
Write-Host “Get Jobs ${url}”
json = curl.exe --globoff --header "PRIVATE-TOKEN:{token}" “${url}” | ConvertFrom-Json

foreach($job in $json)
{ 
    $job_id = $job.id
    Write-Host "Erase ${job_id}"
    curl.exe --request DELETE --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/artifacts"
}

}