[merge request ] "merge" action becomes very slow after upgrad 13.1 from 11.0.1

In the past, “merge” will complete in seconds, but now it will take 3 min or longer to merge.
In background, i have 300 - 400 background jobs. The more jobs, the longer merge time.
Please help, how to debug this issue? Thanks!

up! need help :joy:

any thing related to “Auto Devops”? i have closed “Auto Devops” in admin area, but issue still exists

Hi,

please share some more details on your environment, such as the available resources for the host, the OS/distribution, and the amount of projects/groups accessing your GitLab server.

Also, does your monitoring unveil any performance bottlenecks in metric graphs?

Cheers,
Michael

1 Like

Thanks for your reply!
Sometimes we got “pre-receive” error message like this:
“Merge failed: Somthing went wrong during merge pre-receive hook. Please try again.”
And we suggest user to close and reopen the “Merge Request”, MR will complete successfully and error won’t occur.
In 13.1, gitaly-hooks are deployed as excutable file, we cannot get the detailed fail reason.
Could you provide some debug method? Or any possible reason?
Thanks!

Hi,

whenever this error happens, please investigate in the production logs to capture more context and possible errors.

Cheers,
Michael

1 Like

@dnsmichi Thanks for your reply!
Add some info:
7/3 upgrade gitlab 11.1.0 -> 13.1.0
7/14 add https support
Error like “pre-receive hook” fail occured from
Sometime user need to retry click “Merge” button 3 times or more to complete the merge action.

My gitlab.rb config change:
external_url 'https://git.lianjia.com
nginx[‘redirect_http_to_https’] = true
nginx[‘ssl_certificate’] = "/etc/gitlab/trusted-certs/lianjia.com.crt"
nginx[‘ssl_certificate_key’] = "/etc/gitlab/trusted-certs/lianjia.com.key"

And I checked the production logs, it seems like before
(I try to solve “can’t verify CSRF token” error by add nginx setting, no effect)
nginx[‘proxy_set_headers’] = {
“X-Forwarded-Proto” => “http”,
“CUSTOM_HEADER” => “VALUE”
}

This problem really bothered us, many users complained Gitlab unstable.
Please help. Thanks!

And, the connection between gitlab and runners becomes very unstable,
should I reboot the server?
Or set “nginx[‘http2_enabled’] = false” ? ( I saw another gitlab with https config setting, turn off http2_enabled)
Or any other suggestion?

Add more info:
OS: CentOS 7.5; CPU 24 cores; Memory 64G
Group 400+; Projects 16000+
No performance issue before upgrade
Now we have 3 problems:

  1. sometimes “merge” action takes 1 min or longer time
  2. sometimes “merge” action “pre-receive hook” fail, need to try 2, 3 or more times
  3. connection between gitlab and runners unstable

Please help! Thanks a lot!

Sometimes we got error like this:

Pipeline status:

  1. one commit triggered 2 pipelines
  2. pipeline executes very delay
  3. pipeline status shows “pending” or “created” unless click the refresh button or cancel it

Sorry for my so many questions…I cannot fall to sleep for these problems…

Update:
“pipeline pending” and “runner unstable” problems solved after I stop the gitlab backup service.
We have 2 gitlab servers, 2 isolated installations, but share one database & one repositories storage.
This architecture works well in 11.1.0, but not supported in 13.1.0. Stange problems disappeared after i stop the backup serivce.
I don’t know why, but i thinks there is somthing related to data access.

Hi,

share one database & one repositories storage

does that mean that you are using the same authorization/user for both GitLab servers … also, do they write into the same database? This might result into table or row based locking when transactions take quite a bit. Your setup doesn’t sound small.

Some other ideas we’ve discussed:

  • Upgrading 2 versions (11 → 13) could mean that you left out DB migrations with not doing the incremental upgrade path
    • Worst case scenario is that you are missing important DB indexes which make SQL statements slow.
  • 13.1 mentions the CSRF token on upgrade, this could mean that you are affected by that.

To further troubleshooting this, I’d say to check the PostgreSQL server logs and see if you run into locked transactions. Also, monitor for slow query execution and analyse the queries from there. Also, please share the PostgreSQL database version from your external host. 13.x requires at least PostgreSQL 11.

For a long term plan, I would recommend to rebuild the environment into two different databases, PostgreSQL HA, etc. with Gitaly Clusters when needed.

Cheers,
Michael