Using git filter-repo to rewrite commit history

git filter-repo is a neat tool to rewrite commit history.

A recent use case required me to move an entire repository to another repository while preserving history. A simple git subtree would do the move, but one little problem is after the commit history moved over, the commit messages would have references to the pull request number from the original repo, then Github would create a link using the PR number but based on the new repo, which would point to a wrong PR in the new repo.

To avoid that, I'd need to rewrite commit history. But since there were many many commits, I had to figure out doing this in a scriptable way. This is where git filter-repo comes in handy, which is the replacement of the old filter-branch.

To do that, filter-repo has a filter for commit messages. The filter also has a --message-callback option, which would be followed by a Python code body to process commit message. The callback would give you the original commit message as the standard input and use the output as the new commit message, both in byte strings. In this case, a simple regular expression would do the trick:

git filter-repo --message-callback 'return re.sub(br"(#(\d{1,3}))\n", br"(https://github.com/FooOrg/BarRepo/pull/\1)", message)'

Another consideration is I only wanted to rewrite the commits from the original repo, but filter-repo by default tries to rewrite all the commits in the repo. --refsargument is exactly for this purpose where you can specify a range of commits. For example, I have all the commits in my local branch ahe/foo, then I will just tell filter-repo to only rewrite commit messages for the diffs between the main branch and ahe/foo branch: --refs main..ahe/foo.