Removing sensitive data from GitLab

Hi, this is more of a sanitizing sensitive data scenario with regards to GitLab but my question is simple. If I’m a GitLab admin and I found one of my users inadvertently pulling sensitive data (hardcoded variables) into their repo by accident, aside from obliterating the entire repo itself, is there anything else that needs to be done/performed to ensure the data actually gets destroyed within GitLab itself?

Your terminology is confusing, with respect to “one of my users inadvertently pulling sensitive data into their repo”, but I will interpret it as “another person with write access to the repo has pushed sensitive data to the repo”.

With git, there is no way to delete a particular commit in the history, without deleting all subsequent commits. But you can use a script that recreates all the subsequent commits, they would then be created as new commits with new commit hashes, so everyone using the repo would have to stop working while this is done, and then pull all the new commits. (Beware: This is a tricky operation that requires serious coordination, but it can be done).

If you do that, and then Run Housekeeping in the repo settings, I believe that all traces should eventually disappear.

Hi Mattias,

You’re correct. I meant to say “one of my users inadvertently pushing sensitive data onto their repo”. Yep, unfortunately historical git commits would have to be overwritten or flushed. It’s a pain but I guess this is also a learning experience with properly using “.gitignore”.

Thanks,