GitLab CE, high CPU load with no running tasks

Hey folks.

Running GitLab 14.4.4 on CentOS 7.9.

Everything is up-to-date.

I noticed, that after upgrading to version 14 perhaps (after getting hit by the unauthenticated RCE), that our server non-stop flaps with CPU load.

CPU Utilization in itself is low, but the CPU load is high.

There is only 1 background task that runs, killing it makes no difference, since GitLab restarts it in a bit.

There is not even any I/O wait being recorded, so i have absolutely no idea whats causing the load.

And there is not even much IO or read/write activity as well.

The sidekiq reports the following as being run :



REDACTED:780384 queues:authorized_project_update:authorized_project_update_project_create,authorized_project_update:authorized_project_update_project_group_link_create,authorized_project_update:authorized_project_update_project_recalculate,authorized_project_update:authorized_project_update_project_recalculate_per_user,authorized_project_update:authorized_project_update_user_refresh_from_replica,authorized_project_update:authorized_project_update_user_refresh_over_user_range,authorized_project_update:authorized_project_update_user_refresh_with_low_urgency,auto_devops:auto_devops_disable,auto_merge:auto_merge_process,chaos:chaos_cpu_spin,chaos:chaos_db_spin,chaos:chaos_kill,chaos:chaos_leak_mem,chaos:chaos_sleep,container_repository:cleanup_container_repository,container_repository:container_expiration_policies_cleanup_container_repository,container_repository:delete_container_repository,cronjob:admin_email,cronjob:analytics_usage_trends_count_job_trigger,cronjob:authorized_project_update_periodic_recalculate,cronjob:ci_archive_traces_cron,cronjob:ci_delete_unit_tests,cronjob:ci_pipeline_artifacts_expire_artifacts,cronjob:ci_platform_metrics_update_cron,cronjob:ci_schedule_delete_objects_cron,cronjob:ci_stuck_builds_drop_running,cronjob:ci_stuck_builds_drop_scheduled,cronjob:container_expiration_policy,cronjob:database_batched_background_migration,cronjob:database_drop_detached_partitions,cronjob:database_partition_management,cronjob:dependency_proxy_image_ttl_group_policy,cronjob:environments_auto_delete_cron,cronjob:environments_auto_stop_cron,cronjob:expire_build_artifacts,cronjob:gitlab_service_ping,cronjob:import_export_project_cleanup,cronjob:import_stuck_project_import_jobs,cronjob:issue_due_scheduler,cronjob:jira_import_stuck_jira_import_jobs,cronjob:member_invitation_reminder_emails,cronjob:metrics_dashboard_schedule_annotations_prune,cronjob:namespaces_in_product_marketing_emails,cronjob:namespaces_prune_aggregation_schedules,cronjob:packages_composer_cache_cleanup,cronjob:pages_domain_removal_cron,cronjob:pages_domain_ssl_renewal_cron,cronjob:pages_domain_verification_cron,cronjob:partition_creation,cronjob:personal_access_tokens_expired_notification,cronjob:personal_access_tokens_expiring,cronjob:pipeline_schedule,cronjob:prune_old_events,cronjob:releases_manage_evidence,cronjob:remove_expired_group_links,cronjob:remove_expired_members,cronjob:remove_unaccepted_member_invites,cronjob:remove_unreferenced_lfs_objects,cronjob:repository_archive_cache,cronjob:repository_check_dispatch,cronjob:requests_profiles,cronjob:schedule_merge_request_cleanup_refs,cronjob:schedule_migrate_external_diffs,cronjob:ssh_keys_expired_notification,cronjob:ssh_keys_expiring_soon_notification,cronjob:stuck_ci_jobs,cronjob:stuck_export_jobs,cronjob:stuck_merge_jobs,cronjob:trending_projects,cronjob:update_container_registry_info,cronjob:user_status_cleanup_batch,cronjob:users_create_statistics,cronjob:users_deactivate_dormant_users,cronjob:x509_issuer_crl_check,dependency_proxy:purge_dependency_proxy_cache,dependency_proxy_blob:dependency_proxy_cleanup_blob,dependency_proxy_manifest:dependency_proxy_cleanup_manifest,deployment:deployments_drop_older_deployments,deployment:deployments_hooks,deployment:deployments_link_merge_request,deployment:deployments_update_environment,gcp_cluster:cluster_configure_istio,gcp_cluster:cluster_install_app,gcp_cluster:cluster_patch_app,gcp_cluster:cluster_provision,gcp_cluster:cluster_update_app,gcp_cluster:cluster_upgrade_app,gcp_cluster:cluster_wait_for_app_installation,gcp_cluster:cluster_wait_for_app_update,gcp_cluster:cluster_wait_for_ingress_ip_address,gcp_cluster:clusters_applications_activate_service,gcp_cluster:clusters_applications_deactivate_service,gcp_cluster:clusters_applications_uninstall,gcp_cluster:clusters_applications_wait_for_uninstall_app,gcp_cluster:clusters_cleanup_project_namespace,gcp_cluster:clusters_cleanup_service_account,gcp_cluster:wait_for_cluster_creation,github_importer:github_import_import_diff_note,github_importer:github_import_import_issue,github_importer:github_import_import_lfs_object,github_importer:github_import_import_note,github_importer:github_import_import_pull_request,github_importer:github_import_import_pull_request_merged_by,github_importer:github_import_import_pull_request_review,github_importer:github_import_refresh_import_jid,github_importer:github_import_stage_finish_import,github_importer:github_import_stage_import_base_data,github_importer:github_import_stage_import_issues_and_diff_notes,github_importer:github_import_stage_import_lfs_objects,github_importer:github_import_stage_import_notes,github_importer:github_import_stage_import_pull_requests,github_importer:github_import_stage_import_pull_requests_merged_by,github_importer:github_import_stage_import_pull_requests_reviews,github_importer:github_import_stage_import_repository,hashed_storage:hashed_storage_migrator,hashed_storage:hashed_storage_project_migrate,hashed_storage:hashed_storage_project_rollback,hashed_storage:hashed_storage_rollbacker,incident_management:clusters_applications_check_prometheus_health,incident_management:incident_management_add_severity_system_note,incident_management:incident_management_pager_duty_process_incident,incident_management:incident_management_process_alert_worker_v2,jira_connect:jira_connect_forward_event,jira_connect:jira_connect_retry_request,jira_connect:jira_connect_sync_branch,jira_connect:jira_connect_sync_builds,jira_connect:jira_connect_sync_deployments,jira_connect:jira_connect_sync_feature_flags,jira_connect:jira_connect_sync_merge_request,jira_connect:jira_connect_sync_project,jira_importer:jira_import_advance_stage,jira_importer:jira_import_import_issue,jira_importer:jira_import_stage_finish_import,jira_importer:jira_import_stage_import_attachments,jira_importer:jira_import_stage_import_issues,jira_importer:jira_import_stage_import_labels,jira_importer:jira_import_stage_import_notes,jira_importer:jira_import_stage_start_import,mail_scheduler:mail_scheduler_issue_due,mail_scheduler:mail_scheduler_notification_service,object_pool:object_pool_create,object_pool:object_pool_destroy,object_pool:object_pool_join,object_pool:object_pool_schedule_join,object_storage:object_storage_background_move,object_storage:object_storage_migrate_uploads,package_repositories:packages_debian_generate_distribution,package_repositories:packages_debian_process_changes,package_repositories:packages_go_sync_packages,package_repositories:packages_helm_extraction,package_repositories:packages_maven_metadata_sync,package_repositories:packages_nuget_extraction,package_repositories:packages_rubygems_extraction,pipeline_background:archive_trace,pipeline_background:ci_archive_trace,pipeline_background:ci_build_trace_chunk_flush,pipeline_background:ci_daily_build_group_report_results,pipeline_background:ci_pipeline_artifacts_coverage_report,pipeline_background:ci_pipeline_artifacts_create_quality_report,pipeline_background:ci_pipeline_success_unlock_artifacts,pipeline_background:ci_ref_delete_unlock_artifacts,pipeline_background:ci_test_failure_history,pipeline_cache:expire_job_cache,pipeline_cache:expire_pipeline_cache,pipeline_creation:ci_external_pull_requests_create_pipeline,pipeline_creation:create_pipeline,pipeline_creation:merge_requests_create_pipeline,pipeline_creation:run_pipeline_schedule,pipeline_default:ci_create_cross_project_pipeline,pipeline_default:ci_create_downstream_pipeline,pipeline_default:ci_drop_pipeline,pipeline_default:ci_merge_requests_add_todo_when_build_fails,pipeline_default:ci_pipeline_bridge_status,pipeline_default:ci_retry_pipeline,pipeline_default:pipeline_metrics,pipeline_default:pipeline_notification,pipeline_hooks:build_hooks,pipeline_hooks:pipeline_hooks,pipeline_processing:build_finished,pipeline_processing:build_queue,pipeline_processing:build_success,pipeline_processing:ci_build_finished,pipeline_processing:ci_build_prepare,pipeline_processing:ci_build_schedule,pipeline_processing:ci_initial_pipeline_process,pipeline_processing:ci_resource_groups_assign_resource_from_resource_group,pipeline_processing:pipeline_process,pipeline_processing:stage_update,pipeline_processing:update_head_pipeline_for_merge_request,repository_check:repository_check_batch,repository_check:repository_check_clear,repository_check:repository_check_single_repository,todos_destroyer:todos_destroyer_confidential_issue,todos_destroyer:todos_destroyer_destroyed_designs,todos_destroyer:todos_destroyer_destroyed_issuable,todos_destroyer:todos_destroyer_entity_leave,todos_destroyer:todos_destroyer_group_private,todos_destroyer:todos_destroyer_private_features,todos_destroyer:todos_destroyer_project_private,unassign_issuables:members_destroyer_unassign_issuables,update_namespace_statistics:namespaces_root_statistics,update_namespace_statistics:namespaces_schedule_aggregation,analytics_usage_trends_counter_job,approve_blocked_pending_approval_users,authorized_keys,authorized_projects,background_migration,bulk_import,bulk_imports_entity,bulk_imports_export_request,bulk_imports_pipeline,bulk_imports_relation_export,chat_notification,ci_delete_objects,create_commit_signature,create_note_diff_file,default,delete_diff_files,delete_merged_branches,delete_stored_files,delete_user,design_management_copy_design_collection,design_management_new_version,destroy_pages_deployments,detect_repository_languages,disallow_two_factor_for_group,disallow_two_factor_for_subgroups,email_receiver,emails_on_push,environments_auto_stop,environments_canary_ingress_update,error_tracking_issue_link,experiments_record_conversion_event,expire_build_instance_artifacts,export_csv,external_service_reactive_caching,file_hook,flush_counter_increments,github_import_advance_stage,gitlab_performance_bar_stats,gitlab_shell,group_destroy,group_export,group_import,import_issues_csv,invalid_gpg_signature_update,irker,issuable_export_csv,issuable_label_links_destroy,issuables_clear_groups_issue_counter,issue_placement,issue_rebalancing,mailers,merge,merge_request_cleanup_refs,merge_request_mergeability_check,merge_requests_delete_source_branch,merge_requests_handle_assignees_change,merge_requests_resolve_todos,metrics_dashboard_prune_old_annotations,metrics_dashboard_sync_dashboards,migrate_external_diffs,namespaceless_project_destroy,namespaces_onboarding_issue_created,namespaces_onboarding_pipeline_created,namespaces_onboarding_progress,namespaces_onboarding_user_added,new_issue,new_merge_request,new_note,packages_composer_cache_update,pages,pages_domain_ssl_renewal,pages_domain_verification,pages_transfer,pages_update_configuration,phabricator_import_import_tasks,post_receive,process_commit,project_cache,project_daily_statistics,project_destroy,project_export,project_service,projects_git_garbage_collect,projects_post_creation,projects_schedule_bulk_repository_shard_moves,projects_update_repository_storage,prometheus_create_default_alerts,propagate_integration,propagate_integration_group,propagate_integration_inherit,propagate_integration_inherit_descendant,propagate_integration_project,propagate_service_template,reactive_caching,rebase,releases_create_evidence,remote_mirror_notification,repository_cleanup,repository_fork,repository_import,repository_remove_remote,repository_update_remote_mirror,self_monitoring_project_create,self_monitoring_project_delete,service_desk_email_receiver,snippets_schedule_bulk_repository_shard_moves,snippets_update_repository_storage,system_hook_push,update_external_pull_requests,update_highest_role,update_merge_requests,update_project_statistics,upload_checksum,web_hook,web_hooks_destroy,web_hooks_log_execution,wikis_git_garbage_collect,x509_certificate_revoke
**Queues:** authorized_project_update:authorized_project_update_project_create, authorized_project_update:authorized_project_update_project_group_link_create, authorized_project_update:authorized_project_update_project_recalculate, authorized_project_update:authorized_project_update_project_recalculate_per_user, authorized_project_update:authorized_project_update_user_refresh_from_replica, authorized_project_update:authorized_project_update_user_refresh_over_user_range, authorized_project_update:authorized_project_update_user_refresh_with_low_urgency, auto_devops:auto_devops_disable, auto_merge:auto_merge_process, chaos:chaos_cpu_spin, chaos:chaos_db_spin, chaos:chaos_kill, chaos:chaos_leak_mem, chaos:chaos_sleep, container_repository:cleanup_container_repository, container_repository:container_expiration_policies_cleanup_container_repository, container_repository:delete_container_repository, cronjob:admin_email, cronjob:analytics_usage_trends_count_job_trigger, cronjob:authorized_project_update_periodic_recalculate, cronjob:ci_archive_traces_cron, cronjob:ci_delete_unit_tests, cronjob:ci_pipeline_artifacts_expire_artifacts, cronjob:ci_platform_metrics_update_cron, cronjob:ci_schedule_delete_objects_cron, cronjob:ci_stuck_builds_drop_running, cronjob:ci_stuck_builds_drop_scheduled, cronjob:container_expiration_policy, cronjob:database_batched_background_migration, cronjob:database_drop_detached_partitions, cronjob:database_partition_management, cronjob:dependency_proxy_image_ttl_group_policy, cronjob:environments_auto_delete_cron, cronjob:environments_auto_stop_cron, cronjob:expire_build_artifacts, cronjob:gitlab_service_ping, cronjob:import_export_project_cleanup, cronjob:import_stuck_project_import_jobs, cronjob:issue_due_scheduler, cronjob:jira_import_stuck_jira_import_jobs, cronjob:member_invitation_reminder_emails, cronjob:metrics_dashboard_schedule_annotations_prune, cronjob:namespaces_in_product_marketing_emails, cronjob:namespaces_prune_aggregation_schedules, cronjob:packages_composer_cache_cleanup, cronjob:pages_domain_removal_cron, cronjob:pages_domain_ssl_renewal_cron, cronjob:pages_domain_verification_cron, cronjob:partition_creation, cronjob:personal_access_tokens_expired_notification, cronjob:personal_access_tokens_expiring, cronjob:pipeline_schedule, cronjob:prune_old_events, cronjob:releases_manage_evidence, cronjob:remove_expired_group_links, cronjob:remove_expired_members, cronjob:remove_unaccepted_member_invites, cronjob:remove_unreferenced_lfs_objects, cronjob:repository_archive_cache, cronjob:repository_check_dispatch, cronjob:requests_profiles, cronjob:schedule_merge_request_cleanup_refs, cronjob:schedule_migrate_external_diffs, cronjob:ssh_keys_expired_notification, cronjob:ssh_keys_expiring_soon_notification, cronjob:stuck_ci_jobs, cronjob:stuck_export_jobs, cronjob:stuck_merge_jobs, cronjob:trending_projects, cronjob:update_container_registry_info, cronjob:user_status_cleanup_batch, cronjob:users_create_statistics, cronjob:users_deactivate_dormant_users, cronjob:x509_issuer_crl_check, dependency_proxy:purge_dependency_proxy_cache, dependency_proxy_blob:dependency_proxy_cleanup_blob, dependency_proxy_manifest:dependency_proxy_cleanup_manifest, deployment:deployments_drop_older_deployments, deployment:deployments_hooks, deployment:deployments_link_merge_request, deployment:deployments_update_environment, gcp_cluster:cluster_configure_istio, gcp_cluster:cluster_install_app, gcp_cluster:cluster_patch_app, gcp_cluster:cluster_provision, gcp_cluster:cluster_update_app, gcp_cluster:cluster_upgrade_app, gcp_cluster:cluster_wait_for_app_installation, gcp_cluster:cluster_wait_for_app_update, gcp_cluster:cluster_wait_for_ingress_ip_address, gcp_cluster:clusters_applications_activate_service, gcp_cluster:clusters_applications_deactivate_service, gcp_cluster:clusters_applications_uninstall, gcp_cluster:clusters_applications_wait_for_uninstall_app, gcp_cluster:clusters_cleanup_project_namespace, gcp_cluster:clusters_cleanup_service_account, gcp_cluster:wait_for_cluster_creation, github_importer:github_import_import_diff_note, github_importer:github_import_import_issue, github_importer:github_import_import_lfs_object, github_importer:github_import_import_note, github_importer:github_import_import_pull_request, github_importer:github_import_import_pull_request_merged_by, github_importer:github_import_import_pull_request_review, github_importer:github_import_refresh_import_jid, github_importer:github_import_stage_finish_import, github_importer:github_import_stage_import_base_data, github_importer:github_import_stage_import_issues_and_diff_notes, github_importer:github_import_stage_import_lfs_objects, github_importer:github_import_stage_import_notes, github_importer:github_import_stage_import_pull_requests, github_importer:github_import_stage_import_pull_requests_merged_by, github_importer:github_import_stage_import_pull_requests_reviews, github_importer:github_import_stage_import_repository, hashed_storage:hashed_storage_migrator, hashed_storage:hashed_storage_project_migrate, hashed_storage:hashed_storage_project_rollback, hashed_storage:hashed_storage_rollbacker, incident_management:clusters_applications_check_prometheus_health, incident_management:incident_management_add_severity_system_note, incident_management:incident_management_pager_duty_process_incident, incident_management:incident_management_process_alert_worker_v2, jira_connect:jira_connect_forward_event, jira_connect:jira_connect_retry_request, jira_connect:jira_connect_sync_branch, jira_connect:jira_connect_sync_builds, jira_connect:jira_connect_sync_deployments, jira_connect:jira_connect_sync_feature_flags, jira_connect:jira_connect_sync_merge_request, jira_connect:jira_connect_sync_project, jira_importer:jira_import_advance_stage, jira_importer:jira_import_import_issue, jira_importer:jira_import_stage_finish_import, jira_importer:jira_import_stage_import_attachments, jira_importer:jira_import_stage_import_issues, jira_importer:jira_import_stage_import_labels, jira_importer:jira_import_stage_import_notes, jira_importer:jira_import_stage_start_import, mail_scheduler:mail_scheduler_issue_due, mail_scheduler:mail_scheduler_notification_service, object_pool:object_pool_create, object_pool:object_pool_destroy, object_pool:object_pool_join, object_pool:object_pool_schedule_join, object_storage:object_storage_background_move, object_storage:object_storage_migrate_uploads, package_repositories:packages_debian_generate_distribution, package_repositories:packages_debian_process_changes, package_repositories:packages_go_sync_packages, package_repositories:packages_helm_extraction, package_repositories:packages_maven_metadata_sync, package_repositories:packages_nuget_extraction, package_repositories:packages_rubygems_extraction, pipeline_background:archive_trace, pipeline_background:ci_archive_trace, pipeline_background:ci_build_trace_chunk_flush, pipeline_background:ci_daily_build_group_report_results, pipeline_background:ci_pipeline_artifacts_coverage_report, pipeline_background:ci_pipeline_artifacts_create_quality_report, pipeline_background:ci_pipeline_success_unlock_artifacts, pipeline_background:ci_ref_delete_unlock_artifacts, pipeline_background:ci_test_failure_history, pipeline_cache:expire_job_cache, pipeline_cache:expire_pipeline_cache, pipeline_creation:ci_external_pull_requests_create_pipeline, pipeline_creation:create_pipeline, pipeline_creation:merge_requests_create_pipeline, pipeline_creation:run_pipeline_schedule, pipeline_default:ci_create_cross_project_pipeline, pipeline_default:ci_create_downstream_pipeline, pipeline_default:ci_drop_pipeline, pipeline_default:ci_merge_requests_add_todo_when_build_fails, pipeline_default:ci_pipeline_bridge_status, pipeline_default:ci_retry_pipeline, pipeline_default:pipeline_metrics, pipeline_default:pipeline_notification, pipeline_hooks:build_hooks, pipeline_hooks:pipeline_hooks, pipeline_processing:build_finished, pipeline_processing:build_queue, pipeline_processing:build_success, pipeline_processing:ci_build_finished, pipeline_processing:ci_build_prepare, pipeline_processing:ci_build_schedule, pipeline_processing:ci_initial_pipeline_process, pipeline_processing:ci_resource_groups_assign_resource_from_resource_group, pipeline_processing:pipeline_process, pipeline_processing:stage_update, pipeline_processing:update_head_pipeline_for_merge_request, repository_check:repository_check_batch, repository_check:repository_check_clear, repository_check:repository_check_single_repository, todos_destroyer:todos_destroyer_confidential_issue, todos_destroyer:todos_destroyer_destroyed_designs, todos_destroyer:todos_destroyer_destroyed_issuable, todos_destroyer:todos_destroyer_entity_leave, todos_destroyer:todos_destroyer_group_private, todos_destroyer:todos_destroyer_private_features, todos_destroyer:todos_destroyer_project_private, unassign_issuables:members_destroyer_unassign_issuables, update_namespace_statistics:namespaces_root_statistics, update_namespace_statistics:namespaces_schedule_aggregation, analytics_usage_trends_counter_job, approve_blocked_pending_approval_users, authorized_keys, authorized_projects, background_migration, bulk_import, bulk_imports_entity, bulk_imports_export_request, bulk_imports_pipeline, bulk_imports_relation_export, chat_notification, ci_delete_objects, create_commit_signature, create_note_diff_file, default, delete_diff_files, delete_merged_branches, delete_stored_files, delete_user, design_management_copy_design_collection, design_management_new_version, destroy_pages_deployments, detect_repository_languages, disallow_two_factor_for_group, disallow_two_factor_for_subgroups, email_receiver, emails_on_push, environments_auto_stop, environments_canary_ingress_update, error_tracking_issue_link, experiments_record_conversion_event, expire_build_instance_artifacts, export_csv, external_service_reactive_caching, file_hook, flush_counter_increments, github_import_advance_stage, gitlab_performance_bar_stats, gitlab_shell, group_destroy, group_export, group_import, import_issues_csv, invalid_gpg_signature_update, irker, issuable_export_csv, issuable_label_links_destroy, issuables_clear_groups_issue_counter, issue_placement, issue_rebalancing, mailers, merge, merge_request_cleanup_refs, merge_request_mergeability_check, merge_requests_delete_source_branch, merge_requests_handle_assignees_change, merge_requests_resolve_todos, metrics_dashboard_prune_old_annotations, metrics_dashboard_sync_dashboards, migrate_external_diffs, namespaceless_project_destroy, namespaces_onboarding_issue_created, namespaces_onboarding_pipeline_created, namespaces_onboarding_progress, namespaces_onboarding_user_added, new_issue, new_merge_request, new_note, packages_composer_cache_update, pages, pages_domain_ssl_renewal, pages_domain_verification, pages_transfer, pages_update_configuration, phabricator_import_import_tasks, post_receive, process_commit, project_cache, project_daily_statistics, project_destroy, project_export, project_service, projects_git_garbage_collect, projects_post_creation, projects_schedule_bulk_repository_shard_moves, projects_update_repository_storage, prometheus_create_default_alerts, propagate_integration, propagate_integration_group, propagate_integration_inherit, propagate_integration_inherit_descendant, propagate_integration_project, propagate_service_template, reactive_caching, rebase, releases_create_evidence, remote_mirror_notification, repository_cleanup, repository_fork, repository_import, repository_remove_remote, repository_update_remote_mirror, self_monitoring_project_create, self_monitoring_project_delete, service_desk_email_receiver, snippets_schedule_bulk_repository_shard_moves, snippets_update_repository_storage, system_hook_push, update_external_pull_requests, update_highest_role, update_merge_requests, update_project_statistics, upload_checksum, web_hook, web_hooks_destroy, web_hooks_log_execution, wikis_git_garbage_collect, x509_certificate_revoke

Any ideas?

I would be glad to hear any ideas on what can be causing this and how to fix the issue.

Personally I don’t see any problems on your system from what you have shown in your post. CPU isn’t high, else it would show it in htop/top. Secondly the sidekiq process is normal.

The only problem I see is your server doesn’t have enough CPU resources. Gitlabs hardware requirements state in the documentation that 4cpu are required. Therefore, I suggest you run this VPS/VM or whatever it is with 4cpu and not 2. I would expect that will solve a lot of any problems that you might have.

While increasing the CPU cores assigned to the VM is easy, i do not see how this will resolve the issue.

You are correct, that there is nothing causing high CPU Utilization, but for some reason, there is high CPU load (those are 2 different things).
And i dont remember having high CPU Load on 13.x.x.
Is there a way to debug somehow this?

Your machine doesn’t meet the minimal requirements of Gitlab. Gitlab requires 4cpu, yours has 2cpu. That can be reason alone for high cpu across 2cpu, when it requires 4cpu. You need to address this.

If you don’t wish to address the cpu issue, then there is no way to debug further until you do this. Imagine a motorway during rush hour, one has 2 lanes (your server), the other has 4 lanes (my server). The 4 lane motorway will be less congested than the 2 lane motorway - which means your CPU load is most likely because of this.

As i said, increasing the VM core count is easy. I already did that.

But i still fail to see how this resolves the issue.

I still get high CPU load, periodically, for no reason at all.

This is in the middle of night. No one is using the server at that time, 100% sure. We have a total of ~250 total users, and about 170 active users.

Here is from my monitoring:

so your max hits 1.11, 0.48, 0.37. Mine is 5 times higher 6.45, 1.93, 0.998, and yet I do not get a critical alerts. Perhaps the problem is your monitoring software is not configured correctly and the values are too low. My peaks are also at around 04:00, so this is likely due to a sidekiq job.

What… ? The issue is not with the monitoring.
As said in the initial post, i was not getting previously such CPU loads. Once i upgraded from 13.x.x, i started observing this.
And the reason i am here, is to try to debug what is causing such CPU loads so often.
And the CPU load is all through the day, not only during the night.

Gitlab 13 is not the same as Gitlab 14. I just showed you my Gitlab 14 installation and also proved that my monitoring isn’t causing any problems. I suggested that your monitoring of CPU load is too low and perhaps you should reconfigure it to be higher. I have no wish to help you any further since you are not listening to any guidance. You seem to think you know better. You obviously don’t want to listen to any assistance so good luck with your problem!

I have no wish to help you any further since you are not listening to any guidance. You seem to think you know better. You obviously don’t want to listen to any assistance so good luck with your problem!

You suggested to increase the assigned CPU cores, since i do not meet the “minimum requirements”. I complied with that, the VM has 4c and 8GB RAM.

I suggested that your monitoring of CPU load is too low and perhaps you should reconfigure it to be higher.

“Too low” is subjective. 1.0 CPU load for 2 cores is akin to 50% of the resources utilized. For 4 cores would be akin to 25%.
You fail to notice what i want - to know WHY i am getting 1.0+ CPU load, NOT why i am getting notifications from my monitoring.
1.0+ CPU load on absolutely idle instance, periodically, all during the day does NOT seem normal to me.
I asked how i can debug this, but i get your answer, as if you are expecting to follow everything to the dot, as if its an order.

Gitlab 13 is not the same as Gitlab 14.

We are not using any of the new features that come with GitLab 14. And whatever we are not using and is possible to be disabled - is disabled.
I have not observed anywhere in the changelogs mentions for increase in CPU load or CPU requirements.

I have much higher load than you (5x), don’t have any problems. I obviously fail to understand why mine works perfectly fine with 5x higher load and you seem to think you have a problem with yours 5x lower. I have 4cpu, 8gb ram. As far as I see, there is no CPU load at all in comparison to mine. So I am curious why you think yours is overloaded, when I have shown and proven a production gitlab with 5x more load has no problems.

I’d really like you to explain why yours doesn’t work properly, which you have failed to do considering your load is 5x lower than mine. As far as I see, you have no problems, but you are insisting that you do.

Just because Gitlab 13 had lower cpu load doesn’t mean there is a problem. So it uses more resources, that would be like me wondering why Windows 95 required more resources than Windows 3.1.

If you mean, that it is expected behavior, and not an actual issue, sure.
Thats one of the things that i am trying to find out.

I have no idea about the sidekiq background scheduled tasks, so i am not aware of how intensive they can be. That is the reason i asked for help debugging this.

Throwing more CPU resources is not really a solution, if there is no actual utilization of those resources (as can be seen - going from 2 to 4 cores did not reduce the CPU load duration/period/intensity).

I am reluctant in increasing the CPU load limit for the monitoring, since when we got hit by the RCE, the limit was pretty high - at 2.0. At such, it failed to record the VM being used to mine crypto - since only 1 core was set to mine, and the load was between 1.0 and 2.0.
If there is nothing to be done, i would consider switching to CPU Utilization monitoring, which would seem a better idea with this behavior.

For monitoring, generally should monitor CPU utilisation and load, not just one or the other. A combination of things, gives a decent idea of what is going on. Memory usage as well.

The first thing I suggested as increase from 2cpu to 4cpu, because you were way below the recommended specifications as per the documentation. The requirements have to be met, else everyone will just reply and say, specs are too low and there won’t be any further help until those have been addressed. There is no need to go higher than this, until your server exceeds the limits as per the matrix: Reference architecture: up to 1,000 users | GitLab

yours is good for 500 users with 4cpu and 8gb ram, but each usage scenario varies. 4cpu at 2GHz, isn’t the same as 4cpu at 4GHz, so in some scenarios, even with 500 users, CPU requirements might be higher than the matrix.

EDIT:

Checked AWS c5.xlarge is 3.6 GHz. So if for example 2GHz cpu’s, then 6cpu or higher may be required, assuming 1.5GHz different.

I run on 4 x 2.4GHz CPU and this is fine for me, no performance/load issues.

Theoretically mine is probably lower than required. If we take 3.6GHz and multiply by 4cpu, then we have approx 14.4GHz total. For 2.4GHz, I would need 6cpu to get to 14.4GHz to make it equivalent. Maybe not a brilliant calculation, probably a bit simplified, but can give an idea for figuring out the requirements perhaps.

I had a similar issue here when upgraded to 14.6.0 from 14.5.2.
While I was examining logs, I have detected a sidekiq boot/shutdown loop that causing high cpu load in my case. I run gitlab in a vps with 2cpu cores with no issues.

I found my solution just downgrading gitlab-ce package to 14.5.2. I mean “just” because I didn’t need to modify gitlab configuration nor restore from any backup.

Nevertheless, I tried to deploy a debian 11 vps and do a clean install of gitlab-ce 14.6.0 to reproduce this behaviour, but there were no sidekiq loops… Stranger things :slight_smile:

Maybe you need to downgrade your gitlab installation and restore from backup.

@fede That’s a bit of a contradiction :slight_smile: it works with no issues, but then you have an issue with high cpu load. Sure downgrading maybe solved it, but that is just the same as me running Windows 10 and not upgrading to Windows 11 because my machine doesn’t have the required specifications (for example missing TPM not including perhaps lack of cpu/ram power). Fine for a while, big security risk later when issues occur, like for example the Gitlab RCE problem of late with cryptominers overloading the server.

I was therefore curious, and went back over my entire monitoring history for the last year, and do not see such problems and I upgrade my system regularly and I also ensure that my server meets the minimum requirements as per the Gitlab documentation. Apart from disabling some stuff I don’t need, like Prometheus, Grafana for example. I certainly don’t see an increase in load like you have from 14.5.x to 14.6. But then I do meet the minimum hardware requirements outlined.

If you are happy with running it downgraded to 14.5.2 and 2cpu, fair enough, I just don’t recommend it considering the issues you encountered with load, as well as the security risks by not upgrading because you cannot due to the lack of cpu resources your machine has.

Technically what you could have done was taken a backup of the machine after the upgrade, made a clean install of Gitlab, restored gitlab.rb and gitlab-secrets.json to /etc/gitlab on the new server, ran gitlab-ctl reconfigure, restore the backup from the old server, and see if the hardware overload started again. It probably would after restoring the backup, since an empty clean server install is not the same as one that is running with a load of repos, etc.

@fede Can you show an example from the sidekiq log? I am trying to find if its the same thing, but dont know what to look for.

Sorry if I have not expressed myself well.
The “contradiction” occurs when I have updated to the latest version, not before. :slight_smile:

This is my configuration:
Puma port: 9999
Nginx port: 9091
auth_backend set to http: // localhost: 9999
And disabled prometheus, alertmanager, grafana, exporters and letsencrypt.
The other smtp and time zone settings do not appear to be relevant to this problem.
The configurations that I have not named are commented in the gitlab.rb file, the default value is used.

I’ll explain the steps I have followed to restore gitlab (several times, without success) from version 14.6.0:

  1. uninstall gitlab-ce using andp apt (with purge).

  2. delete directories:
    /etc/gitlab /
    /var/opt/gitlab/
    /var/log/gitlab/
    /opt/gitlab/
    /run/gitlab/

  3. install gitlab-ce 14.6.0 with apt and watch sidekiq reboot indefinitely. The instance is completely empty, I have not even modified the configuration or restored any backup.

After the new year I will try to update again.

I don’t have the log handy now, but sidekiq stays up for 1 second, then shutdown and another normal boot occur. During this infinite reboot loops of sidekiq, the cpu usage is 97-100%

I have not observed any errors in the log of other gitlab-ce components.

Seems strange. I have a Debian 11 instance with 14.6.x and seems OK. Although this was originally a buster package installed and upgraded from earlier 14.x versions. Out of curiosity I’m going to check it, as it’s my test server anyway and I usually only power it on and upgrade before doing my production server.

EDIT: just checked:

root@gitlab:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye

root@gitlab:~# dpkg -l | grep gitlab-ce
ii  gitlab-ce                      14.6.0-ce.0                    amd64        GitLab Community Edition (including NGINX, Postgres, Redis)

Status of services to make sure no weird restarting going on:

root@gitlab:~# gitlab-ctl status
run: gitaly: (pid 687) 317s; run: log: (pid 681) 318s
run: gitlab-workhorse: (pid 670) 318s; run: log: (pid 665) 318s
run: logrotate: (pid 667) 318s; run: log: (pid 664) 318s
run: nginx: (pid 669) 318s; run: log: (pid 666) 318s
run: postgresql: (pid 691) 317s; run: log: (pid 690) 317s
run: puma: (pid 676) 318s; run: log: (pid 675) 318s
run: redis: (pid 696) 316s; run: log: (pid 693) 316s
run: sidekiq: (pid 668) 318s; run: log: (pid 663) 318s

VM specs:

root@gitlab:~# cat /proc/cpuinfo | grep proc
processor	: 0
processor	: 1
processor	: 2
processor	: 3

root@gitlab:~# free
               total        used        free      shared  buff/cache   available
Mem:         8147784     2525672     4927944       92252      694168     5248384
Swap:       16777212           0    16777212

so 4cpu and 8gb ram. My sidekiq is not restarting all the time. The server has only been up 5 minutes, already uses 2.5GB of ram. CPU is not overloaded though, but then it’s not being used really other than the services doing their stuff in the background.

So if you have these sidekiq issues and you tried this on a VPS with lower specs than what is outlined in the Gitlab documentation, I seriously suggest using 4cpu and 8gb ram - despite you disabling a load of stuff like grafana, prometheus. There obviously seems to be changes in requirements for 14.6.x than earlier versions that means what was possible for before, is becoming less and less possible. But this is normal, so it cannot be expected that Gitlab will always run on hardware less than the recommended specs.

Because Gitlab is made up of Ruby/Rails, Nginx, Postgres, Puma, Gitaly it is always going to use far more resources than say a LAMP server running Apache/Nginx, MySQL/MariaDB, PHP. Whilst it is possible to run that kind of site on a server with 1cpu and 2gb ram, it’s really on the limit - I know because I used to do it. Really even that kind of server needs 2cpu and 4gb ram, especially when the server starts to get increased visitors to the site, since you then need to allow Apache/Nginx to handle more requests, for PHP to handle more requests and perhaps more ram, as well as the amount of ram required to be allocated to MySQL/MariaDB as well. That is far less components than what Gitlab is using, therefore, attempting to run all of those components on less specification is generally going to be problematic and not surprising really.

I just noticed this thread.
My gitlab-ce instance has been running for a long while.
I was observing that since sometime in the upgrades path from gitlab-ce 13 to 15 where I am now, I noticed a large increase in cpu load.

os: ubuntu 20.04 gitlab-ce: 15.0.2
hardware: 4 cpu, 8GB ram virtual host from digital ocean

My system only serves about 30 users and only 1 or 2 are active on a given day, so previously (last year) I ran with only 2 cpus and 4 GB ram. But I started getting timeouts with an unresponsive browser at times, so I upgraded to the recommended hardware.

As this doubles my monthly cost, I’ve looked at what seems to be causing the load. I’m not sure yet, but I noticed a switch to defaulting to use the sidekiq cluster at all times somewhere around version 13 or 14. The default uses:

sidekiq['max_concurrency'] = 50

I am now trying to lower the max concurrency to 10 and I seem to be getting average loads more like I used to now CPU utilization 2-3% and load is 0.2-0.9.
Previously I would often have loads of 5-20 lasting an hour or more. (I’m using the digital ocean monitoring.)

As I mentioned, I’m not sure if I’ve really solved the problem, but the experiment is going well and I may try going back to the 2CPU 4GB ram system for a while to see if that works for me.

Posting in the hope that this might help someone else.