Replace this template with your information
I have a gitlab managed cluster running in GKE.
I’m getting a lot of failures. (timeouts, or mysql command doesn’t work etc). I think some of these are due to lack of resources in the cluster (it has HPA and Vertical Autoscaling on).
** How do I make that better? **
** Why are there so many workloads hanging out on jobs that have failed? **
In some cases they stick around for days and have to be manually deleted.
Describe your question in as much detail as possible:
-
What are you seeing, and how does that differ from what you expect to see?
I expect the workloads to only exist for the life of the job they are running. -
Consider including screenshots, error messages, and/or other helpful visuals
-
What version are you on? Are you using self-managed or GitLab.com?
Gitlab.com-
GitLab (Hint:
/help
): 13.3.0-pre 52083dab1f2 -
Runner (Hint:
/admin/runners
): GKE gitlab-runner
-
GitLab (Hint:
-
Add the CI configuration from
.gitlab-ci.yml
and other configuration if relevant (e.g. docker-compose.yml)
I’ve manually altered the concurrent to 7 in the config.toml
Here’s my config.toml:
concurrent = 7
cpu_limit = "3"
memory_limit = "4Gi"
service_cpu_limit = "2"
service_memory_limit = "2Gi"
helper_cpu_limit = "500m"
helper_memory_limit = "500Mi"
check_interval = 3
log_level = "info"
listen_address = ':9252'
gitlab-ci.yml:
image: gcr.io/$PROJECT/$CUSTOMUBUNTUIMAGE:$TAG
variables:
GIT_DEPTH: 10
GIT_STRATEGY: fetch
REPO_NAME: $REPO
MYSQL_DATABASE: reboot_tests
MYSQL_ROOT_PASSWORD: docker
REF_DESLASHED: ${CI_COMMIT_REF_NAME////_}
PROJECT_NAME: $BUILD_IMAGE
GCP_PROJECT_ID: $PROJECT
SHORT_SHA: $(echo $CI_COMMIT_SHA | cut -c1-7)
KUBERNETES_CPU_REQUEST: "1"
KUBERNETES_CPU_LIMIT: "1.5"
KUBERNETES_MEMORY_REQUEST: "2Gi"
KUBERNETES_MEMORY_LIMIT: "2Gi"
stages:
- backend
- frontend
- build
.job_template: &kube_backend # Hidden key that defines an anchor named 'job_definition'
timeout: "30 minute"
retry: 2
services:
- mysql:5.7
stage: backend
only:
- external_pull_requests
tags:
- kube
variables:
DBNAME: "reboot_tests"
before_script:
#- echo "y" | cpanm --notest --skip-satisfied Date::Business
#- cpanm Alt::Crypt::RSA::BigInt App::cpanoutdated --quiet --notest --skip-satisfied
#- cpanm --quiet --notest --skip-satisfied --installdeps .
#- perl Makefile.PL
#- cpanm SOAP::Lite --skip-satisfied
- apt-get install mariadb-client -y --quiet
- sleep 10
- mysql -h mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SET GLOBAL sql_mode = '';"
- export FRESH_DB=1
.job_template: &kube_frontend # Hidden key that defines an anchor named 'job_definition'
image: gcr.io/$PROJECT/$IMAGE:$TAG
services:
- mysql:5.7
stage: frontend
timeout: "90 minute"
retry: 1
only:
- external_pull_requests
tags:
- devel
variables:
DBNAME: "reboot_tests"
before_script:
- sudo /usr/bin/mysqld_safe --basedir=/usr &
- echo "y" | cpanm --notest --skip-satisfied Date::Business
- cpanm Alt::Crypt::RSA::BigInt App::cpanoutdated --quiet --notest --skip-satisfied
- cpanm --quiet --notest --skip-satisfied --installdeps .
- perl Makefile.PL
- cpanm SOAP::Lite --skip-satisfied
- export PATH=/usr/local/lib/nodejs/node-v10.9.0-linux-x64/bin:$PATH
- sudo npm ci
- npm version
- mysql -h mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SET GLOBAL sql_mode = '';"
- export FRESH_DB=1
- prove -vl t/t_admin/sitemap.t
- script/cetec_reboot_web_server.pl -rp 3001 &
- sleep 15
frontend_testday:
<<: *kube_frontend
script:
- $(npm bin)/cypress run --spec cypress/integration/testday/* --config baseUrl="http://localhost:3001"
artifacts:
paths:
- cypress/videos/testday/
- cypress/screenshots/testday/
when: on_failure
expire_in: 2 days
backend_part:
<<: *kube_backend
script:
- prove -vl t/t_part/*
- What troubleshooting steps have you already taken? Can you link to any docs or other resources so we know where you have been?
I’ve edited concurrent jobs, added the vertical autoscaler, added the KUBERNETES_* resource variables in gitlab-ci.yml