For the last few weeks, I have repeatedly attempted to enable SSL for a custom domain with Gitlab Pages (on gitlab.com), but the SSL certificate creation always fails. The custom domain is verified and HTTP access works, but the Let’s Encrypt SSL certificate creation won’t succeed. Among other things, I have clicked the “Retry” button on the custom domain record after the creation fails.
I have also tried numerous times deleting the custom domain record in the Gitlab UI and going through the entire DNS and verification flow again (updating the TXT record with the new verification value each time), but that hasn’t done anything – the certificate creation always fails.
Another way to test and verify - install certbot locally, and request certificates to see whether it works. Something like certbot certonly --standalone -d yourdomain.com --staple-ocsp -m firstname.lastname@example.org --preferred-challenges dns, using the ACME-DNS challenge.
$ sudo certbot certonly --standalone -d palousedata.co --staple-ocsp -m email@example.com --agree-tos --preferred-challenges dns
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Requesting a certificate for palousedata.co
None of the preferred challenges are supported by the selected plugin
Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
The caching server encountered NXDOMAIN or SERVFAIL when chasing glue records to find an authoritative nameserver, and this event has been cached. Even if the problem has been corrected, or the glue has been pointed somewhere else, the nameserver isn’t going to try asking for it again until an internal timer expires. Requesting a cache purge for the zone in question will usually reset it.
All resolvers have cached the wrong nameserver settings, and as such, continue returning SERVFAIL. That would explain the behavior I am seeing - and your resolver has cached a “good” result.
Here’s the transcript from my interchange with Njalla’s support. It was some kind of DNSSEC issue on their end, that, once they had fixed it, the Gitlab Pages SSL cert creation succeeded and I can now access my Gitlab page via HTTPS with my custom domain name.
Note: I had to put backticks around the links below due to a 10-links-per-message limit here on the forum. Sorry they’re not clickable.
Njalla support transcript
Title: Missing glue record even though I’m using Njalla’s nameservers
When using https://check.njal.la/dns/?name=palousedata.co to debug some DNS challenges that I recently had with the domain palousedata.co, the analysis is showing missing glue records. When I go to the palousedata.co domain configuration, it is setup to use Njalla’s nameservers, so my understanding from your FAQ (https://njal.la/faq/) is that I shouldn’t need to configure any glue records (and that they shouldn’t be missing).
For context, here’s the original issue that surfaced this finding (I’m the user whitecoop in the thread): https://forum.gitlab.com/t/gitlab-com-project-pages-ssl-certificate-creation-repeatedly-failing/93974?u=whitecoop
And here’s the specific message where the missing glue records were found: https://forum.gitlab.com/t/gitlab-com-project-pages-ssl-certificate-creation-repeatedly-failing/93974/4?u=whitecoop
I would appreciate any guidance you have. Thank you
Reply #1 (from Njalla)
Since you use our name servers there is no need for glue records
Reply #2 (from me)
So, the missing glue records shouldn’t have anything to do with the SERVFAILs that the gitlab.com troubleshooting got on multiple machines (in this message here: https://forum.gitlab.com/t/gitlab-com-project-pages-ssl-certificate-creation-repeatedly-failing/93974/4?u=whitecoop)?
From my machine, when I run
$ dig palousedata.co
I get a valid response, but from two different machines/hosts that the gitlab.com troubleshooter was using, digging palousedata.co gave back SERVFAIL
Reply #3 (from me)
Another way to put it, while I thinking about it: are the errors here (https://check.njal.la/dns/?name=palousedata.co) expected? It’s currently showing nameserver errors
Reply #4 (from Njalla)
There was an issue with DNSSEC for your domain, we have resubmitted the dnssec keys to the registry now and it the domain should be working. We will review what might have caused the DNSSEC failure to make sure it does not happen again in the future.
Oh great, thanks for the transparency. I had a thought about DNSSEC but did not want to go there initially, after seeing the nameserver glue record errors as a first starting point. Context: I know how DNSSEC works, but debugging is hard, explaining it is even harder.
But it makes sense now - DNSSEC ensures the chain of trust of