I have gitlab runner app version 16.1.0 and version 0.54.0 kubernetes executor intsalled in GKE cluster with custom node pool with 6 cpu and 1 GB memory. Runner configuration is as below:
affinity: {}
checkInterval: 60
concurrent: 10
configMaps: {}
gitlabUrl: https://git.example.com/
hostAliases: []
image:
image: gitlab-org/gitlab-runner
registry: registry.gitlab.com
imagePullPolicy: IfNotPresent
metrics:
enabled: false
port: 9252
portName: metrics
serviceMonitor:
enabled: false
nodeSelector:
solr: prom
podAnnotations: {}
podLabels: {}
podSecurityContext:
fsGroup: 65533
runAsUser: 100
priorityClassName: ""
rbac:
clusterWideAccess: false
create: true
podSecurityPolicy:
enabled: false
resourceNames:
- gitlab-runner
rules:
- resources:
- configmaps
- pods
- pods/attach
- secrets
- services
verbs:
- get
- list
- watch
- create
- patch
- update
- delete
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- patch
- delete
resources: {}
runnerRegistrationToken:
runners:
cache:
secretName: google-application-credentials
config: |
[[runners]]
output_limit = 10240
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "ubuntu:20.04"
privileged = true
[runners.kubernetes.node_selector]
solr = "prom"
[runners.cache]
Type = "gcs"
Path = "cache"
Shared = false
[runners.cache.gcs]
AccessID = ""
PrivateKey = "-----BEGIN PRIVATE KEY-----\\n-----END PRIVATE KEY-----\n"
BucketName = "node-modules-caching"
name: gitlab-runner
tags: k8s-runner
secrets: []
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: false
runAsNonRoot: true
service:
enabled: false
type: ClusterIP
sessionServer:
enabled: false
terminationGracePeriodSeconds: 3600
tolerations: []
useTini: false
volumeMounts: []
volumes: []
Now what is happening is when there is pipeline running one after another than it works fine. But suppose I execute 7-10 pipelines at one time than the jobs get executed properly but it gets stuck after completion of job at creating caching.You can see in the below image.
If I cancel all those 7 pipelines and run one after another than it will complete. There is nothing in the logs also. Can anyone help what is happening here?
Also it docker dind is running with this. It’s deployment is as follows:
## Deployment for docker-dind
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: docker-dind
name: docker-dind
namespace: gitlab-runner
spec:
replicas: 1
selector:
matchLabels:
app: docker-dind
template:
metadata:
labels:
app: docker-dind
spec:
containers:
- image: docker:19.03-dind
name: docker-dind
env:
- name: DOCKER_HOST
value: tcp://docker-dind:2375/
- name: DOCKER_TLS_CERTDIR #Disable TLS as traffic is not going outside of network.
value: ""
volumeMounts:
- name: docker-dind-data-vol #Persisting the docker data
mountPath: /var/lib/docker/
ports:
- name: daemon-port
containerPort: 2375
protocol: TCP
securityContext:
privileged: true #Required for dind container to work.
nodeSelector:
solr: prom
volumes:
- name: docker-dind-data-vol
persistentVolumeClaim:
claimName: docker-dind-data
Also job getting stuck on creating cache and if 8-10 pipelines are running and there is a new pipeline made to run than pod gets in pending state and failed.
ERROR: Job failed (system failure): prepare environment: waiting for pod running: timed out waiting for pod to start.