Remove all artifact (no expire options)

Hi all,

How do I clean all artefacts (for each test pipeline, without an “expire_in” option) that takes up a lot of space on my personal server?

1 Like

Looking at the issues page, this appears to be a feature that has been worked on and is waiting for one more approval before being merged:

Until then, you would have to go to the folder on the server and delete it manually

If you have hundreds of builds doing it manually is not practical. You can use the rest api with curl, and your favorite scripting language

#!/bin/sh
project_id=456
token=secret
server=myserver
start_job=2
end_job=8
for job_id in $(seq $start_job $end_job)
do 
   curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
done

3 Likes

Thanks for this snippet. I tried it out and got a tonne of {"error": "404 not found"}. I’m hoping that’s from jobs which didn’t have artifacts.

After a little poking, I found that this snippet worked for me on macOS

#!/bin/bash
project_id="0000000"
token="teenagemutantninjaturtles"
server="gitlab.com"
start_job=30592507
end_job=30626126

for job_id in $(jot - $start_job $end_job)
do 
	URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
	echo "$URL"
	curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
	echo "\n"
done
3 Likes

I’ve made another version (in bash) the explicitly list your own jobs (useful if you use Gitlab-CI SaaS) using JavaScript browser console.

http://blog.fgribreau.com/2018/01/remove-gitlab-ci-artifacts-in-batch.html

1 Like

Thanks for the hints. Your job id loop can be slow. It is better to query the job list first. I am using this python script now:

import requests
project = "my-project"
token = "mytoken"
response = requests.get("https://server/api/v4/projects/%s/jobs?private_token=%s" % (project,token))
response.raise_for_status()
for job in response.json():
    if 'artifacts_file' in job:
        print("Erasing %d..." % job['id'])
        requests.post("https://server/api/v4/projects/%s/jobs/%d/erase?private_token=%s" % (project,job['id'],token))
1 Like

Thanks to everyone here for their tidbits!

I’ve written a bash script that not only automates identifying which jobs have artifacts but it works with large numbers of jobs as well (GitLab API has a 100 item page limit). Only requirements are bash, curl, and jq. I’ve tested it on MacOS, CentOS, and Ubuntu.

1 Like

To finish things out, I wrote a blog post discussing this issue and the proper method for handling artifact expiration. I’ve included my script as well for deleting already existing artifacts.

Powershell script for remove artifacts:

$project_id=“???”
$token=“???”
$server=“gitlab.com

for($page=2; $page -lt 50; $page++)
{
$url = “https://${server}/api/v4/projects/${project_id}/jobs?scope=success&scope=manual&per_page=100&page=${page}”
Write-Host “Get Jobs ${url}”
$json = curl.exe --globoff --header “PRIVATE-TOKEN:${token}” “${url}” | ConvertFrom-Json

foreach($job in $json)
{ 
    $job_id = $job.id
    Write-Host "Erase ${job_id}"
    curl.exe --request DELETE --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/artifacts"
}

}

import requests
import json

class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r

project = 'project_id'
token='token'

url = f'https://gitlab.com/api/v4/projects/{project}/jobs'
response = requests.get(url, auth=BearerAuth(token))

data= json.loads(response.text)

for item in data:
    url=f'https://gitlab.com/api/v4/projects/{project}/jobs/{item["id"]}/clear'
    response = requests.post(url, auth=BearerAuth(token))

python script for remove all artifacts of the project

1 Like

I improved the bash script to be a more maintainable. This bash script only depends on jq and tested on Linux:

The updates:

  1. Query all jobs and obtain a list from server.
  2. Delete every jobs and artifacts in the list page by page.
  3. No need to pull in big batteries (e.g. programming language).
  4. Works for large list of jobs (mine was 1325 jobs with randomized ID on gitlab.com).
#!/bin/bash
# Copyright 2021 "Holloway" Chew, Kean Ho <kean.ho.chew@zoralab.com>
# Copyright 2020 Benny Powers (https://forum.gitlab.com/u/bennyp/summary)
# Copyright 2017 Adam Boseley (https://forum.gitlab.com/u/adam.boseley/summary)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


##############
# user input #
##############
# project ID (Help: goto "Settings" > "Generals")
projectID=""

# user API token (Help: "User Settings" > "Access Tokens" > enable "API")
token=""

# gitlab server instance (E.g. 'gitlab.com')
server="gitlab.com"

# CI Jobs pagninations (Help: "CI/CD" > "Jobs" > see bottom pagnination bar)
#
# NOTE: user interface might be bug. If so, you need to manually calculate.
# Example:
#   1. For 123 jobs in the past, per_page is "100", it has 2 pages in total
#      [Pages = ROUND_UP(123 / 100)].
start_page="1"
end_page="100"
per_page="100"

# GitLab API version
api="v4"

#####################
# internal function #
#####################
delete() {
        # page
        page="$1"
        1>&2 printf "Cleaning page ${page}...\n"

        # build internal variables
        baseURL="https://${server}/api/${api}/projects"

        # get list from servers for the page
        url="${baseURL}/${pid}/jobs/?page=${page}&per_page=${per_page}"
        1>&2 printf "Calling API to get lob list: ${url}\n"

        list=$(curl --globoff --header "PRIVATE-TOKEN:${token}" "$url" \
                | jq -r ".[].id")
        if [ ${#list[@]} -eq 0 ]; then
                1>&2 printf "list is empty\n"
                return 0
        fi

        # remove all jobs from page
        for jobID in ${list[@]}; do
                url="${baseURL}/${projectID}/jobs/${jobID}/erase"
                1>&2 printf "Calling API to erase job: ${url}\n"

                curl --request POST --header "PRIVATE-TOKEN:${token}" "$url"
                1>&2 printf "\n\n"
        done
}

main() {
        # check dependencies
        if [ -z $(type -p jq) ]; then
                1>&2 printf "[ ERROR ] need 'jq' dependency to parse json."
                exit 1
        fi

        # loop through each pages from given start_page to end_page inclusive
        for ((i=start_page; i<=end_page; i++)); do
                delete $i
        done

        # return
        exit 0
}
main $@

Tested the script:

Before:

After:

1 Like

This was helpful: Jobs artifacts administration | GitLab

As there described, getting the top 20 projects with large space usage

$ gitlab-rails console
include ActionView::Helpers::NumberHelper
ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
  puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path}"
end

Then for each project:

project = Project.find_by_full_path('path/to/project')
builds_with_artifacts =  project.builds.with_downloadable_artifacts
builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
builds_to_clear.find_each do |build|
  build.artifacts_expire_at = Time.now
  build.erase_erasable_artifacts!
end

Other variations are on the same manual.

For some reason, API call didn’t work for me as expected:

curl --request DELETE --header "PRIVATE-TOKEN: <token>" "https://gitlab.example.com/api/v4/projects/<project_id>/artifacts"

Although I got response 202 which shall be good, the same space usage for the project was after this call.

1 Like

I’ve created a tool to do that. If you’all need to use, here is the url of the project:
https://github.com/rafaelperoco/gitlab-artifacts-cleaner

Thanks everyone for sharing your solutions. I have written a Python script that uses python-gitlab to query the GitLab API, and analyse and/or delete job artifacts. Optionally filtered by size or date. Can be used for a project or group of projects, including sub groups. CLI or Docker container. More in

Did not try it yet, looks promising with queuing and raw API calls.

This worked for me, thanks!

Thanks for your exemple, I just edited it slightly to update on new entrypoint/http method because it doesn’t seem to work anymore on the last version of gitlab:

#!/bin/sh
project_id=42
token="secret"
server="myserver"
start_job=0
end_job=1000
for job_id in $(seq $start_job $end_job)
do
    echo "Remove on job ${job_id}"
    curl --request DELETE --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/artifacts"
done

@dnsmichi @lp177 @CeesLuo @rafaelperoco @franko1008
it became integrated natively
i found it in the following doc artifact administration

you just need to uncomment the following line gitlab_rails['expire_build_artifacts_worker_cron'] = "*/7 * * * *" and set your preferred value in the gitlab config file