[ANNOUNCE] git_fast_filter
Elijah Newren
newren at gmail.com
Wed Aug 31 22:00:32 BST 2011
Just thought I'd make this available, in case there's others with
niche needs that find it useful...
git_fast_filter assists with quickly rewriting the history of a repository
by making it easy to write scripts whose purpose is to serve as safe
filters between fast-export and fast-import. git_fast_filter comes with
example programs, a basic test-suite, and a double your money back
satisfaction guarantee. (I love free software.) You can get it from
git://gitorious.org/git_fast_filter/mainline.git
In more detail...
=== Purpose ===
git_fast_filter is designed to make it easy to filter or rewrite the
history of a repository. As such, it fills the same role as
git-filter-branch, and was written primarily to overcome the sometimes
severe speed shortcomings of git-filter-branch. In particular, using
git_fast_filter can avoid thousands or millions of new process forks, and
can allow you to rewrite the same file only one time instead of 50,000
times. However, while using git_fast_filter is fairly simple and quick, it
is hard to beat writing a simple git-filter-branch one-liner for efficiency
of human time. Also, the two tools use very different methods of rewriting
history and do not have exactly overlapping feature sets, so the best tool
for a particular job is going to be very problem dependent.
As human time is often more important than computer time, especially for
one-shot rewrites, git-filter-branch will probably continue to be the more
common tool. However, git_fast_filter is useful in cases where computer
time of a rewrite matters (particularly larger repositories and more
involved rewrites that need to be run and tested many times on large data
sets). Also git_fast_filter has a couple features that may come in handy
in special cases (assisting with generating fast-export output from
scratch, interleaving commits from seperate repositories, and bidirectional
collaboration between filtered and unfiltered repositories).
=== Idea ===
The way git_fast_filter works is by providing a simple python library,
git_fast_filter.py. This library can be used in simple python scripts to
create a filter for the output of git-fast-export. Thus, the typical
calling convention is of the form:
git fast-export | filter_script.py | git fast-import
=== Example ===
An example script that renames the 'master' branch to 'other is shown
below (this is similar to the example in the git-fast-export manpage, but
is safe against the string 'refs/heads/master' appearing in some file or
commit message in the repository):
#!/usr/bin/python
from git_fast_filter import Commit, FastExportFilter
def my_commit_callback(commit):
if commit.branch == "refs/heads/master":
commit.branch = "refs/heads/other"
filter = FastExportFilter(commit_callback = my_commit_callback)
filter.run()
The user can then run this script by:
$ mkdir target && cd target && git init
$ (cd /PATH/LEADING/TO/source && git fast-export --all) \
| /PATH/TO/filter_script.py | git fast-import
(Note: The user can have the script take care of the git init, the cd's,
and the invocations of git fast-export and git fast-import by just passing
directory names to FastExportFilter.run; however, writing out the details
explicitly as in the above example makes it clearer what is going on.)
Elijah
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the git-announce
mailing list