Large project import from GitHub fails

The import of a large GitHub project fails after several hours without a clear message why. I have tried to find out what the issue is but so far I couldn’t identify it. Any idea where to look and what settings to adjust in order to fix this? Thanks!

Screenshot 2023-08-27 at 3.26.39 PM

It may be relevant that I have checked all options on import because I want to get as much information as possible:

Just checking, @fjahr: Have you already reviewed the documentation on this issue?

Which version of GitLab is involved (navigate to /help), and how is GitLab installed (self-managed packages, containers, cloud-native operator/Helm chart, or SaaS).

Can you share the URL to the GitHub project that is imported? The numbers in the screenshot look quite large, maybe there are API rate limits that cause failure.

Hi, thanks for your message and sorry for the late reply, I was travelling for a while.

I have been running v16.3.0-ee and it’s a self-managed installation. I have mostly followed the Ubuntu steps here: https://about.gitlab.com/install/#ubuntu##

The Github project is GitHub - bitcoin/bitcoin: Bitcoin Core integration/staging tree, so yes, it’s pretty big. How could I check if an API rate limit was hit and that caused the failure?

Thanks, I didn’t see it before. I have taken a look at that page though I am not sure if it really applies to my situation. The sync process doesn’t just encounter errors, it gets completely stuck. I’m not sure if the errors result in the sync process getting stuck, the page is unclear on that. Where would I need to check to see those errors?

Also, one of the suggestions on the page is to use the alternative import method, which is what I have been using already (see screenshot). But I have not tried the github_importer_lower_per_page_limit feature flag yet, I will give this a shot.

@bbehr Hm, ok, so I created a group (didn’t have one before), changed the github_importer_lower_per_page_limit on the level of that group and imported the project from GitHub into that group. The process still didn’t finish again but at least that might give some further insight into what is going wrong?

Screenshot 2023-10-07 at 10.05.01 AM

I would also argue in favour of carefully checking the speed limits of the API. In my opinion, due to the magnitude of the numbers you presented, it led to just such a failure, I would also advise reading the documentation of the problem itself.

@HarisonDrake Hi, which documentation do you mean when you say “I would also advise reading the documentation of the problem itself”? Are you sure that it’s related to a speed limit? How would I be able to confirm that suspicion? Thanks!

So I have spent some time today trying to any helpful information in any of the various the log files but I was not successful. Weirdly, Log system | GitLab is empty for example and in the others like sidekiq I could see some importer related logs but none of them seemed to be related to any sort of error. It would be great if someone could point me in the right direction about how to debug this.

@fjahr: A GitLab teammate made me aware of the Congregate Utility, which GitLab’s Professional Services team maintains as an open source project. My teammate notes that “it might handle large imports better than the importer process available via the UI.” See what you think!

Thanks @bbehr , I am experimenting with Congregate for a few days now. I am still running into some issues, e.g. Import of public repository from Github.com fails (#995) · Issues · GitLab.org / Professional Services Automation / Tools / Migration / Congregate · GitLab but I think there is a good chance this will work for us! Is there a way to get in contact with someone from the team to ask a few questions? I have been able to debug a few previous issues but I am now stuck for a while on this latest one.