Using git filter-repo to rewrite commit history
git filter-repo
is a neat tool to rewrite commit history.
A recent use case required me to move an entire repository to another repository while preserving history. A simple git subtree
would do the move, but one little problem is after the commit history moved over, the commit messages would have references to the pull request number from the original repo, then Github would create a link using the PR number but based on the new repo, which would point to a wrong PR in the new repo.
To avoid that, I'd need to rewrite commit history. But since there were many many commits, I had to figure out doing this in a scriptable way. This is where git filter-repo
comes in handy, which is the replacement of the old filter-branch
.
To do that, filter-repo
has a filter for commit messages. The filter also has a --message-callback
option, which would be followed by a Python code body to process commit message. The callback would give you the original commit message as the standard input and use the output as the new commit message, both in byte strings. In this case, a simple regular expression would do the trick:
git filter-repo --message-callback 'return re.sub(br"(#(\d{1,3}))\n", br"(https://github.com/FooOrg/BarRepo/pull/\1)", message)'
Another consideration is I only wanted to rewrite the commits from the original repo, but filter-repo
by default tries to rewrite all the commits in the repo. --refs
argument is exactly for this purpose where you can specify a range of commits. For example, I have all the commits in my local branch ahe/foo
, then I will just tell filter-repo
to only rewrite commit messages for the diffs between the main
branch and ahe/foo
branch: --refs main..ahe/foo
.