Gitlab ELK

Hi there,

These days I was trying to build some log analysis system on my own hosted GitLab community servers to monitor performance better. I set up ELK but struggling at finding useful information in the GitLab logs and show in the Kibana.

For example, I want to know who has pushed to the GitLab over the peroid of time. I have checked the logs and only found that “grpc.method” is CommitDelta in gitaly log will record some information each time I pushed. However, it only records the project path and no other information.

I can write to the log in the hook but I was wondering if anyone here was trying to do the similar to me so I can reuse the work they did.

Finally, I found everything I need by myself. Problem solved by me.

@Jie Bravo! :clap:
I had just started an issue to look into how this would be something GitLab customers would want to use. The Idea came from another customer.

Anything you would want to add? Anything that would make the integration easier?

Hi John,

It is good that finally there is someone working on this. Will this feature in community version as well?

As I was using Gitlab community version and don’t have elastic search integration with Gitlab, I installed filebeat on Gitlab servers, and send logs to a separate ELK server over SSL to logstash.

In filebeat confiuation, I created two new fields called: log_group & log_id for each of gitlab log as below:

log_group: gitlab_research
log_id: audit_json

This is used to differentiate different GitLab instances I managed. Also, I will want to monitor other none Gitlab instances and that’s why log_group is important to me. I read the following logs to get all the information I want:

user over ssh commit: /var/log/gitlab/gitlab-shell/gitlab-shell.log
user over https commit: /var/log/gitlab/gitlab-rails/production_json.log
user login: /var/log/gitlab/gitlab-rails/audit_json.log
user geoinformation: /var/log/gitlab/gitlab-rails/production_json.log

I created a disk usage log (as I can’t find this information in gitlab logs) in order to get remaining diskspace from gitaly repository like below.


df -P | grep data |   jq -R -c '
      split("\n") |
      .[] |
      if test("^/") then
        gsub(" +"; " ") | split(" ") | {time: (now |todate), mount: .[0], spacetotal: .[1], spaceavail: .[3], mounton: .[5]}

On the ELK server, do the following change to logstash. 1. identify what kind of gitlab instances this is. 2. change fields to what I want such as date field to @timestamp; change remote ip to geoiop so that I can use geo map; change diskspace to int as the default field they - I think they means logstash - think is string.

filter {
  if [log_group] =~ "gitlab_research" {
    date {
      match => [ "time", "ISO8601" ]
      target => "@timestamp"
    if [log_id] =~ "production_json" {
      geoip {
        source => "remote_ip"
    if [log_id] == "diskusage_json" {
      mutate {
        convert => { 
           "spaceavail" => "integer" 
           "spacetotal" => "integer" 

It is important to add some debug approach when things went wrong which is always. So add stdout to the output can identify a lot of problems.

output {
  stdout {codec => rubydebug }
  elasticsearch { 
    hosts => ["localhost:9200"] 

I didn’t change others in elasticsearch or kinbana other than building those charts. I created Gitlab User Login over timestamp, Gitlab Geo information, Gitlab User commits over HTTPS, Gitlab User Commits over SSH, and Gitlab available diskspace on user repository.

It’s working so far okay. But I am still testing their accuracy before I can apply these settings to other Gitlab instances. The final goal is to ansible all these steps.

Hope this helps.

I will advocate that this is in Community Edition. I can’t think of any reason for this to be only in Enterprise Edition. It’s only convenient that some Enterprise Edition customers with Advanced Search enabled would already have a functioning ELK stack.

Thank you for the details. This is definitely helpful!

1 Like