Hi all,
How do I clean all artefacts (for each test pipeline, without an “expire_in” option) that takes up a lot of space on my personal server?
Hi all,
How do I clean all artefacts (for each test pipeline, without an “expire_in” option) that takes up a lot of space on my personal server?
Looking at the issues page, this appears to be a feature that has been worked on and is waiting for one more approval before being merged:
Until then, you would have to go to the folder on the server and delete it manually
If you have hundreds of builds doing it manually is not practical. You can use the rest api with curl, and your favorite scripting language
#!/bin/sh
project_id=456
token=secret
server=myserver
start_job=2
end_job=8
for job_id in $(seq $start_job $end_job)
do
curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
done
Thanks for this snippet. I tried it out and got a tonne of {"error": "404 not found"}
. I’m hoping that’s from jobs which didn’t have artifacts.
After a little poking, I found that this snippet worked for me on macOS
#!/bin/bash
project_id="0000000"
token="teenagemutantninjaturtles"
server="gitlab.com"
start_job=30592507
end_job=30626126
for job_id in $(jot - $start_job $end_job)
do
URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
echo "$URL"
curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
echo "\n"
done
I’ve made another version (in bash) the explicitly list your own jobs (useful if you use Gitlab-CI SaaS) using JavaScript browser console.
http://blog.fgribreau.com/2018/01/remove-gitlab-ci-artifacts-in-batch.html
Thanks for the hints. Your job id loop can be slow. It is better to query the job list first. I am using this python script now:
import requests
project = "my-project"
token = "mytoken"
response = requests.get("https://server/api/v4/projects/%s/jobs?private_token=%s" % (project,token))
response.raise_for_status()
for job in response.json():
if 'artifacts_file' in job:
print("Erasing %d..." % job['id'])
requests.post("https://server/api/v4/projects/%s/jobs/%d/erase?private_token=%s" % (project,job['id'],token))
Thanks to everyone here for their tidbits!
I’ve written a bash script that not only automates identifying which jobs have artifacts but it works with large numbers of jobs as well (GitLab API has a 100
item page limit). Only requirements are bash, curl, and jq. I’ve tested it on MacOS, CentOS, and Ubuntu.
To finish things out, I wrote a blog post discussing this issue and the proper method for handling artifact expiration. I’ve included my script as well for deleting already existing artifacts.
Powershell script for remove artifacts:
$project_id=“???”
$token=“???”
$server=“gitlab.com”for($page=2; $page -lt 50; $page++)
{
$url = “https://${server}/api/v4/projects/${project_id}/jobs?scope=success&scope=manual&per_page=100&page=${page}”
Write-Host “Get Jobs ${url}”
$json = curl.exe --globoff --header “PRIVATE-TOKEN:${token}” “${url}” | ConvertFrom-Jsonforeach($job in $json) { $job_id = $job.id Write-Host "Erase ${job_id}" curl.exe --request DELETE --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/artifacts" }
}
import requests
import json
class BearerAuth(requests.auth.AuthBase):
def __init__(self, token):
self.token = token
def __call__(self, r):
r.headers["authorization"] = "Bearer " + self.token
return r
project = 'project_id'
token='token'
url = f'https://gitlab.com/api/v4/projects/{project}/jobs'
response = requests.get(url, auth=BearerAuth(token))
data= json.loads(response.text)
for item in data:
url=f'https://gitlab.com/api/v4/projects/{project}/jobs/{item["id"]}/clear'
response = requests.post(url, auth=BearerAuth(token))
python script for remove all artifacts of the project
I improved the bash script to be a more maintainable. This bash script only depends on jq
and tested on Linux:
The updates:
#!/bin/bash
# Copyright 2021 "Holloway" Chew, Kean Ho <kean.ho.chew@zoralab.com>
# Copyright 2020 Benny Powers (https://forum.gitlab.com/u/bennyp/summary)
# Copyright 2017 Adam Boseley (https://forum.gitlab.com/u/adam.boseley/summary)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############
# user input #
##############
# project ID (Help: goto "Settings" > "Generals")
projectID=""
# user API token (Help: "User Settings" > "Access Tokens" > enable "API")
token=""
# gitlab server instance (E.g. 'gitlab.com')
server="gitlab.com"
# CI Jobs pagninations (Help: "CI/CD" > "Jobs" > see bottom pagnination bar)
#
# NOTE: user interface might be bug. If so, you need to manually calculate.
# Example:
# 1. For 123 jobs in the past, per_page is "100", it has 2 pages in total
# [Pages = ROUND_UP(123 / 100)].
start_page="1"
end_page="100"
per_page="100"
# GitLab API version
api="v4"
#####################
# internal function #
#####################
delete() {
# page
page="$1"
1>&2 printf "Cleaning page ${page}...\n"
# build internal variables
baseURL="https://${server}/api/${api}/projects"
# get list from servers for the page
url="${baseURL}/${pid}/jobs/?page=${page}&per_page=${per_page}"
1>&2 printf "Calling API to get lob list: ${url}\n"
list=$(curl --globoff --header "PRIVATE-TOKEN:${token}" "$url" \
| jq -r ".[].id")
if [ ${#list[@]} -eq 0 ]; then
1>&2 printf "list is empty\n"
return 0
fi
# remove all jobs from page
for jobID in ${list[@]}; do
url="${baseURL}/${projectID}/jobs/${jobID}/erase"
1>&2 printf "Calling API to erase job: ${url}\n"
curl --request POST --header "PRIVATE-TOKEN:${token}" "$url"
1>&2 printf "\n\n"
done
}
main() {
# check dependencies
if [ -z $(type -p jq) ]; then
1>&2 printf "[ ERROR ] need 'jq' dependency to parse json."
exit 1
fi
# loop through each pages from given start_page to end_page inclusive
for ((i=start_page; i<=end_page; i++)); do
delete $i
done
# return
exit 0
}
main $@
Tested the script:
Before:
After:
This was helpful: Jobs artifacts administration | GitLab
As there described, getting the top 20 projects with large space usage
$ gitlab-rails console
include ActionView::Helpers::NumberHelper
ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path}"
end
Then for each project:
project = Project.find_by_full_path('path/to/project')
builds_with_artifacts = project.builds.with_downloadable_artifacts
builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
builds_to_clear.find_each do |build|
build.artifacts_expire_at = Time.now
build.erase_erasable_artifacts!
end
Other variations are on the same manual.
For some reason, API call didn’t work for me as expected:
curl --request DELETE --header "PRIVATE-TOKEN: <token>" "https://gitlab.example.com/api/v4/projects/<project_id>/artifacts"
Although I got response 202
which shall be good, the same space usage for the project was after this call.
I’ve created a tool to do that. If you’all need to use, here is the url of the project:
https://github.com/rafaelperoco/gitlab-artifacts-cleaner
Thanks everyone for sharing your solutions. I have written a Python script that uses python-gitlab to query the GitLab API, and analyse and/or delete job artifacts. Optionally filtered by size or date. Can be used for a project or group of projects, including sub groups. CLI or Docker container. More in
Did not try it yet, looks promising with queuing and raw API calls.
This worked for me, thanks!
Thanks for your exemple, I just edited it slightly to update on new entrypoint/http method because it doesn’t seem to work anymore on the last version of gitlab:
#!/bin/sh
project_id=42
token="secret"
server="myserver"
start_job=0
end_job=1000
for job_id in $(seq $start_job $end_job)
do
echo "Remove on job ${job_id}"
curl --request DELETE --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/artifacts"
done
@dnsmichi @lp177 @CeesLuo @rafaelperoco @franko1008
it became integrated natively
i found it in the following doc artifact administration
you just need to uncomment the following line gitlab_rails['expire_build_artifacts_worker_cron'] = "*/7 * * * *"
and set your preferred value in the gitlab config file