DataDog integration issue (GitLab & GitLab Runner - No data received)

Hello All,

I have a GitLab CE Core (Self-Management) server GCP installed (CentOS 7).
Also I have a DataDog (Pro) subscription (around 100 nodes on Infrastructure metrics monitoring) with datadog-agent v6.15.1.
I pressed “Install” integrations for GitLab & GitLab Runner. Then I enabled the datadog agent integrations (just copied ‘/etc/datadog-agent/conf.d/gitlab.d/conf.yml.example’ to the ‘conf.yml’ and for GitLab Runner did the same) and restarted datadog agent.
Oh. And of course I’ve entered:
'- gitlab url: http:///?token=
and ‘prometheus_endpoint: http://localhost:9090/metrics’ for GitLab & GitLab Runner as well.
After ‘sudo datadog-agent status’ command I can see 154 metrics in GitLab running check and 36 metrics in GitLab Runner check.
But anyways, on the DataDog Integration page my installed GitLab and GitLab Runner integrations have a yellow square “No data received”.
And I can’t see most of useful metrics which concerning GitLab repositories, access to and etc.
When I’m trying to enable additional metrics in conf.yml files it doesn’t allow to see additional metrics anyways.
Somebody had or has similar issues maybe?
Thanks for any help!

P.S. The DataDog Support people just asked to repeat integration steps and they’re sending useful-less responses (once a week)… What is really sad…

Alexey Budkevich
Senior DevOps Engineer
Real Estate Webmasters

Hi,

seems that the Datadog integration just scrapes the Prometheus endpoint from the observability suite, described at https://docs.datadoghq.com/integrations/gitlab/

I’d investigate on the Prometheus end and check whether the listed metric names from their doc tables are accessible and exported.

I don’t know how exactly the datadog agent works (closed source is always hard to gather insights from), but it would be worthwhile restarting it, or checking if it has some debug logging or tracing for sending over the metrics - https://docs.datadoghq.com/agent/troubleshooting/

Cheers,
Michael

Hi,

Thanks for quick response.
I’m restarting my datadog agent just after each attempt.
What concerning agent logs - I didn’t find any useful in logs yet (looks like a normal collecting state).
I have a few guess about those issues sources:

  1. CGP IAM role.
    Was created for datadog couldn’t have enough permissions to check some resources spawned within GCP (role created accordingly datadog integration guide).
  2. GitLab <-> Prometheus module integration by default.
    In Prometheus conf I have all sensors enabled by default. The main Prometheus sensor is using standard api ~:9090/metrics and gitlab-rails sensor is using api ~:8080/-/metrics.
    When I’m calling ‘curl ~:8080/-/metrics’ I’m getting all GitLab metrics which I want to collect. However datadog-agent doesn’t show any metrics in status when I’m changing that in agent conf.yml file.
  3. GitLab /etc/gitlab/gitlab.rb conf file.
    I think it could be possible to config an allowance for Prometheus module to get GitLab metrics etc.
    And I’ve putted “gitlab_rails[‘monitoring_whitelist’] = [‘localhost’]” into that file and reconfigure GitLab as it written in GitLab help. Didn’t help. And in GitLab docs no more useful info about it.
  4. The DataDog needs some additional settings, creds, allowances, etc. about GitLab/GitLab Runner integrations. E.g. add gitlab.py to the ‘/etc/datadog-agent/checks.d/’ directory with dd-agent:dd-agent creds. Or something similar?! I’ve recently added two .py files from ‘https://raw.githubusercontent.com/DataDog/integrations-core/master/gitlab/datadog_checks/gitlab/’ directory (and for gitlab_runner also) but nothing changed…
    Do I need now to re-install both integrations in datadog integration page? (I’ll do it soon, because who knows)…
  5. I really disappointed in DataDog last times… Nothing personal. But their support - it is something…

Hi,

I truly understand that the problem is frustrating, but it doesn’t help blaming a vendor support here. Better tell them directly how they can improve their workflow - as a paying customer.

In terms of the problem, I’d suggest cutting it down to the lowest environment possible. You’re mentioning some additional GitLab runner integration parts - remove them temporarily, and troubleshoot only the agent parts for the base metrics.

I don’t know whether the agent uses TLS encryption as a transport to their API, but you may at least check a message transport via tcpdump. If there is measured traffic from Prometheus to Datadog, you’re one step further with debugging.

Sourcing from their docs, I can see that you can put the agent into debug mode. Try this and grep in the logs for the sent metrics. https://docs.datadoghq.com/agent/troubleshooting/debug_mode/?tab=agentv6v7

If that doesn’t help with the problem, collect all these findings and discuss with the Datadog support.

Cheers,
Michael

Hey,

Thanks for your suggestions, but I wrote them once a week like a “Can you help me please?” however getting a super responses like a “send us flare -> we can see that you re-installed your integrations -> we can see also that you restarted your datadog agent multiple times -> and silence for all” Oh, Man! Is it payed support? Really?
What concerning your TLS encryption suggestion, it can be an idea, thanks!
Because exact one year ago I had a big issue with DataDog <-> RabbitMQ on K8s Integration.
And I’ve resolved it 100% by myself just after a long deep debugging.
In the conf.yml file I had to use https instead of http (and different ports) for api.
And well-payed datadog support didn’t help me at that time…