Self-managed gitlab on kubernetes sometimes got slow respones or bad gateway on web ui

Hello,

Our Gitlab (v14.1.5) are hosted on Kubernetes (AWS EKS), installing using helm chart.
We have a deployment called gitlab-webservice-default, which has 1 replicaset with 2 desired pods.

Sometimes, we notice that the web ui is very slow, unresponsive, or unreachable with 502 bad gateway. When we use kubectl get pod, we will see that gitlab-webservice-default pods are at 1/2 READY status.

  • The user we have now is 2600 users (with around1500 active users).
  • The db sizing is 1 vCPU, 2GB memory.
  • gitlab-webservice-default pod resources are 4 CPU, 12GB memory

This issue happen unpredictably with no cpu, memory, or db-connection spiking out of ordinary. Below is some of the logs we found during the outage.

. . . .
Logs in gitlab-webservice-default, webservice container

Completed 500 Internal Server Error in 59988ms (ActiveRecord: 59963.7ms | Elasticsearch: 0.0ms | Allocations: 1555093)
Rack::Timeout::RequestTimeoutException (Request ran for longer than 60000ms):
lib/gitlab/issuable_metadata.rb:67:in `map'
lib/gitlab/issuable_metadata.rb:67:in `block in issuable_ids'
lib/gitlab/utils/strong_memoize.rb:30:in `strong_memoize'
lib/gitlab/issuable_metadata.rb:62:in `issuable_ids'
lib/gitlab/issuable_metadata.rb:28:in `data'
app/controllers/concerns/issuable_collections.rb:49:in `set_pagination'
app/controllers/concerns/issuable_collections.rb:21:in `set_issuables_index'
app/controllers/application_controller.rb:483:in `set_current_admin'
lib/gitlab/session.rb:11:in `with_session'
app/controllers/application_controller.rb:474:in `set_session_storage'
lib/gitlab/i18n.rb:99:in `with_locale'
lib/gitlab/i18n.rb:105:in `with_user_locale'
app/controllers/application_controller.rb:468:in `set_locale'
app/controllers/application_controller.rb:462:in `set_current_context'
lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'
lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'
lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'
lib/gitlab/metrics/web_transaction.rb:21:in `run'
lib/gitlab/metrics/rack_middleware.rb:16:in `call'
lib/gitlab/middleware/speedscope.rb:13:in `call'
lib/gitlab/request_profiler/middleware.rb:17:in `call'
lib/gitlab/jira/middleware.rb:19:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:21:in `call'
lib/gitlab/middleware/multipart.rb:172:in `call'
lib/gitlab/middleware/read_only/controller.rb:50:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/same_site_cookies.rb:27:in `call'
lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:21:in `call'
config/initializers/fix_local_cache_middleware.rb:11:in `call'
lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:74:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'

Logs in gitlab-sidekiq-all-in-1-v1

/var/log/gitlab/application_json.log
{"severity":"ERROR","time":"2021-09-20T07:37:40.385Z","correlation_id":"****************","message":"Cannot obtain an exclusive lease for ci/pipeline_processing/atomic_processing_service::pipeline_id:198089. There must be another instance already in execution."}
*** /var/log/gitlab/application.log ***
2021-09-20T07:37:40.385Z: Cannot obtain an exclusive lease for ci/pipeline_processing/atomic_processing_service::pipeline_id:198089. There must be another instance already in execution.
2021-09-20T07:37:40.905Z 11 TID-7auvdj AutoMergeProcessWorker JID-aa9bf8f29fb2e49914639df9 INFO: done: 2.532 sec
2021-09-20T07:37:41.449Z 11 TID-3pfr PipelineMetricsWorker JID-483e26711c2c62d99ea8f3d3 INFO: start
2021-09-20T07:37:41.451Z 11 TID-3pfr PipelineMetricsWorker JID-483e26711c2c62d99ea8f3d3 INFO: arguments: [198078]
2021-09-20T07:37:41.455Z 11 TID-7auikb Ci::PipelineSuccessUnlockArtifactsWorker JID-f9f6ec2aadab8174b4b00af0 INFO: start
2021-09-20T07:37:41.458Z 11 TID-7auikb Ci::PipelineSuccessUnlockArtifactsWorker JID-f9f6ec2aadab8174b4b00af0 INFO: arguments: [198078]
2021-09-20T07:37:41.566Z 11 TID-6jithv PipelineProcessWorker JID-97fd11bb3a023bd43251cf15 INFO: done: 5.85 sec
2021-09-20T07:37:41.822Z 11 TID-3pfr PipelineMetricsWorker JID-483e26711c2c62d99ea8f3d3 INFO: done: 0.373 sec
2021-09-20T07:37:41.832Z 11 TID-u13x3 PipelineProcessWorker JID-4f04e477d6fe58d859383f54 INFO: done: 1.014 sec
2021-09-20T07:37:42.012Z 11 TID-a90frz ExpirePipelineCacheWorker JID-609ff44fe31d3d9859b1ce34 INFO: done: 11.855 sec
2021-09-20T07:37:42.254Z 11 TID-5511m7 Ci::BuildTraceChunkFlushWorker JID-8863294f6566e6e579700963 INFO: start
2021-09-20T07:37:42.257Z 11 TID-5511m7 Ci::BuildTraceChunkFlushWorker JID-8863294f6566e6e579700963 INFO: arguments: [945673]
2021-09-20T07:37:42.337Z 11 TID-34ez ExpirePipelineCacheWorker JID-a38c6bfbedce5411a47160b0 INFO: done: 2.453 sec
sh: 1: /usr/sbin/sendmail: not found
2021-09-20T07:37:42.609Z 11 TID-7l7fpz ActionMailer::MailDeliveryJob JID-4b37c8da3098596af765745e INFO: done: 11.111 sec
2021-09-20T07:37:42.678Z 11 TID-9xkv7v ExpireJobCacheWorker JID-a7f33dfeb59a93be7d1281ca INFO: done: 5.546 sec
2021-09-20T07:37:42.731Z 11 TID-776x3f ExpireJobCacheWorker JID-ad6e1bdf8ec0e24625cf2b10 INFO: done: 2.357 sec
2021-09-20T07:37:42.849Z 11 TID-36s3 ExpirePipelineCacheWorker JID-6251b8ebd0b2da371fc36083 INFO: done: 4.485 sec