Need a data warehouse for a self-hosted GitLab

We have a self-hosted GitLab instance, and recently got questions from the management about which people are using GitLab, and how much they are using it.

As a first thought, we want to call the APIs to retrieve the JSON data about the

  • project,
  • namespace,
  • project membership /api/v4/projects/$num/members/all,
  • etc.

We will need to parse the JSON data and retrieve the wanted attributes into the table columns in the data warehouse. We have experience with Microsoft SSIS (SQL Server Integration Service), so we are thinking about using this toolkit.

Once we have the data in SQL, we will make a join query referencing auxiliary data from the corporate LDAP (Active Directory), so we know which departments the users belong to.

Our Question:

Before reinventing any wheels, we want to double-check whether there are any built-in tools available for this purpose.

Also, are there any other tools popular with GitLab analytics besides SSIS? We highly appreciate any hints and suggestions.

@yzgoa,

without knowing more details about what you’re looking for exactly there are a number of dashboards available for the instance or groups in the Analyze area. I’d explore those first. If you are on an Ultimate license you might also consider Insights which gives you more granular control over the data displayed.

I hope this helps!

-James H, GitLab Product Manager, Analyze:Product Analytics

We are running an instance of GitLab Community Edition 15.1.6.

I didn’t find some of the reports mentioned in the online document:

In the context of “Admin Area > Analytics”, I see two options of “DevOps Reports” and “Usage Trends”. However, I was unable to find other reports. For example, the online document “View repository analytics” says “Analyze > Repository analytics”, and I did not find any button named “Analyze”. I’m not sure if we need a higher license to enable the feature or if there is any configuration change in the /etc/gitlab/gitlab.rb file. Please kindly point out if I missed anything here.

More specifically in our report:

Given a list of users, we need to figure out how many projects they got involved in, including two situations in GitLab context: 1) the user owns the project, or 2) the user is a member of the group containing the project.

As a simple example, user 1 got involved with project A and B, and user 2 got involved with project B and C. So, in summary, the two users got involved with 3 projects in total.

Thank you again for your help so far, and we highly appreciate hints and suggestions.

@yzgoa

Group Insights require an Ultimate license so you’ll probably need to pull the data out of the APIs into a data warehouse as you originally suggested to do the analysis.

There are more graphs available at the Project level in the CE edition but I haven’t looked in depth at what was available with 15.1 but I assume some things have changed between then and the current version (16.4 as of today).

Good luck getting your data!

-James H, GitLab Product Manager, Analyze:Product Analytics

While GitLab itself provides a powerful set of features and APIs for data retrieval, there are also third-party tools and solutions that can help streamline the lead enrichment process. Here are some considerations and alternatives:

GitLab Analytics Built-in Tools:
    1. GitLab itself provides some basic analytics and reporting features that can be useful for understanding project activity, such as:
     2. Built-in Reports: GitLab provides various built-in reports like Burndown Charts, Cycle Analytics, and Contribution Analytics.
     3. Activity Feeds: GitLab has activity feeds for each project and user, which can be used to monitor recent activity.
     4. Audit Events: GitLab records audit events that can be used for security and compliance purposes.

@yzgoa Here’s another 3rd party tool for tracking metrics from your GitLab data: Analytics & Reports by Screenful. It supports 16 chart types and automatically generates charts and metrics with minimum configuration. It supports both GitLab cloud as well as GitLab self-hosted.