Which endpoint should be used to programmatically determine the project ID?

Problem to solve

Hello, I am currently partially automating the development process in my company. An essential component is a Python script that creates a merge request in Gitlab when certain conditions are met. To do this via the Rest API, you need the project ID.

I am now looking for a way to determine this project ID programmatically. It would be ideal for our use case if it were possible to add a filter to the search endpoint or another one on the key/value pair “ssh_url_to_repo”. The script could then reliably determine the appropriate value using “git remote get-url origin” and thus determine the project ID.

Unfortunately, I have not yet found a way for the REST API to filter for the ssh_url_to_repo on the server side. Possible workarounds are to have all projects returned and to search for the appropriate key/value pair yourself, which is quite a lot of data to determine an integer, or to process the ssh URL yourself and use the project path instead of the project ID. But according to the documentation, the path is not reliable when a project is moved.

Which endpoint should be used to programmatically determine the project ID and what information that can be obtained with the Git command is needed to get a clear answer that will give you exactly the one project that is currently checked out?

Unless you know the project ID from the GitLab UI, you need to walk over the list of projects, and match their name. Since this method requires pagination, an abstracted library is recommended, for example, python-gitlab.

Here’s an example for searching a group name, which is similar to searching a project name, but different endpoint. python_gitlab_pagination.py · main · Developer Advocacy at GitLab / use-cases / GitLab API / GitLab API with Python · GitLab

More examples and scripts are available in Efficient DevSecOps workflows: Hands-on python-gitlab API automation

I’m not following this use case. Can you elaborate in more detail what you are trying to achieve here? I.e. a script example.

I want to avoid double bookkeeping and not checkin the project ID in git itself. In my humble opinion, git has already saved the relationship to the project in the project-specific URL to origin. Depending on whether the repo was cloned with ssh or http, the same information must be there as in the key/values ​​"ssh_url_to_repo" or “http_url_to_repo” Unfortunately, in the “GET /projects” request you can only search for the name and not the URL, which forces someone like me who does not save the project ID via a side channel to iterate over all projects until the unique project URL is found.

Therefore, I was hoping that there would be a nicer solution than writing down the project ID or listing all the projects that exist in order to search for the appropriate key on the client side. But this does not seem to be the case, which is a shame because the information on the remote URL is unique and available on both sides.

I’m sorry, I’m still not following. Why is the project ID important to be tracked in Git?

I have written a kind of PoC in pure Python, which hopefully makes my intention clear. I realize that this should not be used productively, but it might help you understand my point.

The assumption is that the user who runs the script is in a checked-out git repository.

#!/usr/bin/env python3
import argparse
import contextlib
import http
import itertools
import json
import os
import subprocess
import urllib.error
import urllib.parse
import urllib.request

from pathlib import PosixPath


BASE_URL = "https://gitlab.com"


def git_remote_url() -> str:
    command = ["/usr/bin/git", "remote", "get-url", "origin"]
    result = subprocess.run(command, capture_output=True)
    if result.returncode != 0:
        raise RuntimeError("git remote not found")
    return result.stdout.decode("utf-8").strip()


def gitlab_request(endpoint: str):
    headers = {
        "Private-Token": os.environ["GITLAB_TOKEN"],
        "Content-Type": "application/json",
    }

    url = f"{BASE_URL}/api/v4/{endpoint}"

    request = urllib.request.Request(url, headers=headers, method="GET")
    with urllib.request.urlopen(request) as response:
        output = response.read().decode('utf-8')
        if response.status == http.HTTPStatus.NO_CONTENT or output == '':
            return dict()
        return json.loads(output)

def project_id_by_url(url2repo: str, guessing=True) -> int:
    if url2repo.startswith("git@"):
        # ssh address parse them
        _, url_path = url2repo.rsplit(':', 1)
    elif url2repo.startswith("http"):
        # http address found
        url_path = urllib.parse.urlparse(url2repo).path
    else:
        raise ValueError(
            "no idea how an address that does not start "
            "with http or git@ should be handled."
        )

    if guessing:
        # make an educated guess as what the project name might be.
        # note from
        # https://docs.gitlab.com/ee/api/rest/#namespaced-path-encoding
        # "a project’s path is not necessarily the same as its name."
        name = PosixPath(url_path).with_suffix('').as_posix()
        name_quoted = urllib.parse.quote_plus(name)

        with contextlib.suppress(KeyError, urllib.error.HTTPError):
            project = gitlab_request(f"projects/{name_quoted}")
            if project["ssh_url_to_repo"] == url2repo or \
               project["http_url_to_repo"] == url2repo:
                return int(project["id"])

    # guessing failed, search in all Projects for url
    # This scales poorly. Here it is desirable to compare url2repo with
    # project["ssh_url_to_repo"] and project["http_url_to_repo"]
    # on the server side.
    for page in itertools.count(1):
        attributes = urllib.parse.urlencode(
            {"per_page": 100, "page": page}
        )

        projects = gitlab_request(f"projects?{attributes}")
        if len(projects) == 0:
            # all projects scaned no match found
            break

        for project in projects:
            if project["ssh_url_to_repo"] == url2repo or \
               project["http_url_to_repo"] == url2repo:
                return int(project["id"])
    raise KeyError("no matching project found.")

def main():
    parser = argparse.ArgumentParser("PoC gitlab id via remote url")
    parser.add_argument("--no-guessing", action="store_false",
                        help="don't try to guess the name of the Gitlab Project")

    args = parser.parse_args()
    git_url = git_remote_url()
    proj_id = project_id_by_url(git_url, guessing=args.no_guessing)

    print(f"{git_url} has project id {proj_id}")


if __name__ == "__main__":
    main()

The projects API is somewhat versatile in that you can look up a project via ID or the URL-encoded path of the project.

If you know the patch of the project, you can obtain the ID via the REST API.

For example: https://gitlab.com/greg/testproject has a project ID of 58831028 in the UI:

image

I can also obtain the project ID via the REST API using the URL-encoded path to the project (greg/testproject is greg%2Ftestproject when URL-encoded):

As such, by making a GET request to https://gitlab.com/api/v4/projects/greg%2Ftestproject, I can obtain the project ID programatically:

image

Hope this helps!

1 Like

As you can see in my code, you are right for practical purposes and I am aware of the possible use of the URL encoded path. Unfortunately,
your API documentation also says: “The path of a project is not necessarily the same as its name.”

So your suggestion is basically just a best effort approach and only works until a project is moved. Okay, I’ve never had this situation in production, but as far as I see, the only reliable way to determine the project ID is to crawl all projects until the URL matches.

In my opinion, there should be an attribute to pass the local URL as a filter so that Gitlab does the comparison and only sends back the one matching project. Otherwise, the client can only make an educated guess and if the assumption was wrong, the server will send a ton of traffic just to find a needle in the haystack of all projects.