https://rtyley.github.io/bfg-repo-cleaner/
an alternative to git-filter-branch
The BFG is a simpler, faster alternative to git-filter-branch
for cleansing bad data out of your Git repository history:
- Removing Crazy Big Files
- Removing Passwords, Credentials & other Private data
The git-filter-branch
command is enormously powerful and can do things that the BFG can't - but the BFG is much better for the tasks above, because:
- Faster : 10 - 720x faster
- Simpler : The BFG isn't particularily clever, but is focused on making the above tasks easy
- Beautiful : If you need to, you can use the beautiful Scala language to customise the BFG. Which has got to be better than Bash scripting at least some of the time.
Usage
First clone a fresh copy of your repo, using the --mirror
flag:
$ git clone --mirror git://example.com/some-big-repo.git
This is a bare repo, which means your normal files won't be visible, but it is a full copy of the Git database of your repository, and at this point you should make a backup of it to ensure you don't lose anything.
Now you can run the BFG to clean your repository up:
$ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard git gc
command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Finally, once you're happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror
flag, this push will update all refs on your remote server):
$ git push
At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.
Examples
In all these examples bfg
is an alias for java -jar bfg.jar
.
Delete all files named 'id_rsa' or 'id_dsa' :
$ bfg --delete-files id_{dsa,rsa} my-repo.git
Remove all blobs bigger than 50 megabytes :
$ bfg --strip-blobs-bigger-than 50M my-repo.git
Replace all passwords listed in a file (prefix lines 'regex:' or 'glob:' if required) with ***REMOVED***
wherever they occur in your repository :
$ bfg --replace-text passwords.txt my-repo.git
Remove all folders or files named '.git' - a reserved filename in Git. These often become a problemwhen migrating to Git from other source-control systems like Mercurial :
$ bfg --delete-folders .git --delete-files .git --no-blob-protection my-repo.git
For further command-line options, you can run the BFG without any arguments, which will output text like this.