We are experiencing random build failures on our Kubernetes GitLab runner. In our build, we execute unit, integration, and acceptance tests. However, the build pods crash without any clear reason.
Steps Taken to Resolve:
-
Increased Resources: Initially, the runner had 16 CPU and 64GB RAM, and we upgraded it to 32 CPU and 128GB RAM, but the issue persists.
-
Increased Volumes: We attempted increasing storage volumes, but the crashes still occur randomly.
-
Enabled Cluster Monitoring: We have enabled cluster monitoring, checked logs in Grafana, and monitored resource usage, but we do not observe any significant spikes before the failures.
Despite these efforts, the issue remains unresolved. We need assistance in identifying the root cause and resolving these random build failures.
Versions
-
GitLab.com
SaaS
Versions
- GitLab 16, 17
Kubernetes Version
- 31