How can I fork a project and its submodules?

Hey everybody,

I want to fork a project on Gitlab that includes three submodules. Forking the main repo was easy, I simply clicked on the “Fork” button in the UI but I am having trouble now to get the submodules into my freshly forked repo, since the fork only gets me the main repo. I can fork the submodules on their own, but then they end up next to the forked main repo. Is there a way to fork recursively, so the whole repo structure including submodules will stay intact? Is there a way to do so using the Gitlab UI? I’m no Git or Gitlab expert, so I hope there is a simple way to do it, and I apologize if I used the wrong terms for some of the stuff I described above.

Thanks in advance for your help, guys!

Hi,

I’m not aware of any recursive fork functionality. Also, you need to keep in mind that a submodule is configured inside Git’s main repo history. If you change its URL origin, this change will always differ from the upstream repository and keeping them in sync is complicated.

If you really want to go that route, I’d suggest forking the other repos all at once, and then adding them to your repository in a dedicated branch next to the master branch, and persist them there.

Whenever you pull changes from the upstream repository, first pull the changes into master, and then merge into your updated fork.

Another possibility would be to pull all the sub module repos directly into your source tree, to avoid problems with submodules at all. That still requires a different branch, with always merging the upstream master branch when changes occur.

Well, over all it sounds complicated the more I think about it. If there is no specific reason to also fork the submodules, stick with them like upstream added them. Modifying/removing submodules is really not fun btw.

Cheers,
Michael

1 Like

May I ask for your opinion about how you would work with a forked project containing submodules if you would want to work on the main repo as well as on the submodules and occasionally create pull requests on each of the repos?

The best solution coming to my mind would be to fork the main repo and the submodules separately and to different directories. Afterwards, change the submodule references in the main repo to link to my local forks.
I guess, I would also need to ignore submodule changes in the main repo. Otherwise the upstream main repo submodule configuration will break. However, I am not sure how to implement that. I could manually “not add” changes of the submodule commit ids when committing main repo changes. I don’t want to change the project’s .gitignore, though.

The main purpose btw. is to figure out a working maven configuration for the whole project. So I’d like to fiddle around with all parts and afterwards create pull requests for each of them at once.

Any suggestions?

Best wishes,
Stefan

Hi,

since submodule pointers are stored in the git commit, I would create a forked-main branch which updates the submodule pointers. This is your origin for any local development going on.

When you decide to send a merge request upstream where you forked from, you need to find a way to remove the submodule updates. Maybe with a script/hook which resets the commit in the history, or does a revert. That implies that the commit has a specific marker where you can grep for.

Since this might be forgotten, additional CI checks in your fork might be necessary to warn about this (detecting the submodule change commit in git log as the most simplest approach).

Still not a beautiful solution while I think about it.

Another thought would be storing the submodule commit HEAD & url somewhere temporarily, with the ability to easily reset it. https://stackoverflow.com/questions/20655073/how-to-see-which-commit-a-git-submodule-points-at/54238999

Even with subtrees and updating code in specific directories, you’d need to clean it up before proposing the MR upstream. Subtrees also require to purge them, and re-add them when they are not fast forward, so not a solution here either.

I would ask myself: What is the main project’s advantage of using submodules instead of pulling things together in CI/CD pipelines or Docker images later? Maybe you can get rid of the submodule approach entirely.

Cheers,
Michael

Thank you for your thoughts. If I understand it right the first paragraph resembles the approach I had in mind. Let the parent project have updated submodules locally to make everything work together. But on a pull-request (or commit to the fork) revert the submodule references. Please correct me if I misunderstood anything.

I don’t know much about CI checks and hooks, yet. Using that to revert specific information sounds interesting. However, in my fork’s remote repo I will likely want to have the updated commit IDs for the submodules so I can easily clone all stuff on another machine. I guess, only the pull request should contain the original submodule references. I am not sure CI or hooks can achieve this.

Even if it would be possible, the effort setting this up might not be worth it for this specific project. I am not going to have pull requests on a daily basis. I will probably raise just a few of them in total.

To answer your last question. The original repo is not under my control. I just want to clean it up a bit to make life easier for others using it. Drastic changes like introducing CI workflows is out of scope.

Best wishes,
Stefan