Persistent Project_Service Queue Job Failure with SSL Cert Failures

Looking at https://mygitlab.com/admin/background_jobs (sidekiq tab in Admin/Monitoring), I am getting lots of errors like this:

Queue =  project_service

Job = ProjectServiceWorker

Error =  OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed

Arguments = 
842, {"object_kind"=>"merge_request", "user"=>{"name"=>"user name", "username"=>"username", "avatar_url"=>"https://secure.gravatar.com/avatar/565489a5d93a21145506d8a886182?s=80&d=identicon"}, "project"=>{"name"=>"projectname", "description"=>"desc", ...

I think it has something to do with some embedded SSL CA file missing my signing authority but it’s very hard to find.
I can configure GIT so it works without any issues with this CA, but the above is probably some worker process (in ruby sidekiq jobs) that is using https urls against localhost unnecessarily. Ideally I’d like it to tell me what the URL it’s accessing when it gets this error.

I figured out the problem. I’m documenting it here in case anyone else get’s curious:

  1. It’s a certificate issue on my server. My server is a gitlab omnibus server, with both mattermost and the main gitlab server services running.

  2. The Gitlab instance sends messages (in my case over https), which are “webhooks”, in the same format that Slack webhooks might be, in fact, it’s the Slack service (gitlab’s terminology for a configurable per-project service endpoint) in each of my projects that is generating these job failures. It’s a bit hard to figure out that from the SideKiq UI.

  3. To solve the problem, I went in and made sure that I could successfully execute a command like this without getting the same certificate failure problem:

curl -i -X POST -d 'payload={"text": "Hello, this is some text.\nThis is more text."}' <https://mymattermostsite.mydomain.com/hooks/oazuwrybtpf85mqs3qu4sjrhho --cacert /opt/gitlab/embedded/ssl/certs/cacert.pem> 
  1. To fix it so curl worked, I could either workaround (not useful!) by dropping certificate validation (not possible inside the current Slack service configuration, and probably not a good idea to even permit it). To fix it, in my case I changed this line in my /etc/gitlab/gitlab.rb:
nginx['ssl_client_certificate'] = "/etc/ssl/certs/ca-bundle.pem" # replaced previous ssl client certificate bundle.
  1. The actual process of producing the correct ca-bundle.pem content for the above is complex. For me, it had to do with the fact that the wildcard certificate my organization gave me did not contain its own trust-chain bundled with it. I did a pile of googling and found people who generated bundles that contain all intermediates, plus the root CA, plus my own site’s wildcard’s CA. So CERTS are signed by CAs, and those CAs have trust relationships. Unless the whole chain is found, you get that certificate verify failed error.

That worked for a while and stopped working. Back to stumped.