[ANNOUNCE] gitfs pre-release 0.01

Mitchell Blank Jr mitch at sfgoth.com
Wed Aug 31 21:59:35 BST 2011


GITFS pre-release version 0.01

gitfs is a FUSE-based filesystem for working with source trees stored in
git repositories.  Currently only very basic functionality is implemented
but I'm hoping to expand it into a useful tool for managing many builds
and patches.

OVERALL PLAN
======= ====

Many years ago I had an idea for a filesystem that would make importing
patches and spinning kernel builds far more efficient.  The basic idea was:

  1. Store the unchanged part of the source tree in a shared repository;
     only keep a separate copy of the files and directories that have
     changed.  This would be similar to doing hardlinked source tree copies
     but be even faster -- checking out a new tree would be as quick as
     mkdir'ing a new empty directory.

  2. On top of this, implement a very-fast "diff" operator that only worked
     on changed files

  3. Have ccache-style compiler caching built in.  The filesystem could
     (using some wrapper programs) watch every file read and written to
     by a command like "gcc".  Since it knows what versions of those files
     were read at that time it can know very quickly if any of them changed.

     This saves the "gcc -E" step that ccache must do to determine that it
     can use the cached .o result and should be quite a bit faster.

So basically common operations such as "compile a test kernel with this
fix" and "produce a well-formed patch describing how my current tree
diverges from mainline" become very fast.

Several times I started to implement this idea but every time I got bogged
down in the details of making kernel parts of the file system and such work.
However, two things changed recently:

  1. git came along; I realized that I could use it for the backing data
     store.  Since the linux kernel is already published in git format
     this is especially handy.  Leveraging the existing git code and
     design has sped this up immeasurably.

  2. FUSE (filesystem in userspace) has become more widely available --
     it hasn't make it to mainline yet but it is in the -mm series kernels.
     This made getting started on actual implementation a lot easier.
     Writing a userspace filesystem on top of FUSE is really a joy.

I'm currently calling this project "gitfs" although perhaps that is a
bit of a misnomer since I am _absolutely_not_ trying to implement the
full SCM workflow as a filesystem.  In fact we present hardly any git
metadata like commit messages at all.  Also, I operate on the underlying
objects directly - the index file is never touched.

However, I decided to stick with the "gitfs" name for now -- I'm hoping that
this project can grow to become a useful compliment to the git workflow.
I'm not adverse to giving it a different name if it's an issue, though.

CURRENT STATE
======= =====

This is an early pre-release that only demonstrates the most basic of
functionality -- read-only access to the existing tags and objects in
the git repository.  Still, it's already a somewhat handy tool which is
why I'm announcing it now.

In addition to the missing functionality, currently there is a lot of
performance work to do -- I've been working on getting it functionally
correct first.  Specific performance work I'm planing includes:

  1. Every time we touch a directory (whether lookup or readdir) we parse
     the git tree object into a memory structure which then immediately gets
     thrown away.  There is some infrastructure for caching these objects
     in memory which will solve the problem, but it's not completed yet.

  2. On a related note, I need to do some data structure work -- in some
     places I'm using simple linked lists where I really should be using
     B-tree's or something.  I actually have a lot of this work done already
     but I need to do some heavy testing before I integrate that into my
     tree.

  3. Since we cache the uncompressed file data our read/write operations
     always go straight through to the underlying files.  A large performance
     boost would be available if at open()-time we could tell the kernel
     "here's the file descriptor I opened for you, do I/O to that"  That
     way we could avoid the need for all data to make two user/kernel
     transitions.  However, this would require some extensive work to
     FUSE to implement.

  4. We are currently single threaded; I eventually am planning on adding
     service threads for handling CPU bound tasks.  I want to keep the
     normal filesystem operations single-threaded (they're generally just
     walking in-memory structures so they're fast anyway), but things like
     uncompressing a git object should really be done in separate thread
     so they won't block other filesystem operations.

Finally, since I'm still working on finishing the infrastructure work, please
just consider this a "preview release"  Feel free to play with it, look at
the code, poke it with sticks, etc.  However, the code base is still rapidly
evolving so I probably won't be able to integrate any non-trivial patches yet.
The code also needs things like more comments and clear error messages.

BUILDING GITFS
======== =====

  Gitfs can currently be obtained at:
	http://www.sfgoth.com/~mitch/linux/gitfs/

  Please refer to the included INSTALL file for directions on compiling
  the gitfs binary.

RUNNING GITFS
======= =====

  MOUNT:
    gitfs [-d] [-O object_cache_dir] <gitdir> <mntpoint>
  UMOUNT:
    gitfs -u [-d] <dir>

  Options:

    -d -- debugging mode; we run in the foreground and print very verbose
          messages about what is going on (mostly courtesy of FUSE)

    -O -- specify an object cache directory.  For fast performance we always
	  store the result of decompressing a git "blob" object in a file.
	  This directory is where the decompressed objects live.

    	  This currently defaults to "/tmp/gitfs/ocache"  DO NOT make this
	  the same as your ".git/objects" directory or things will probably
	  become horribly broken!

	  Currently gitfs never removes anything from the ocache so it
	  can grow quite large.  However it's safe to prune files from it
	  (or even blow away the entire tree) while gitfs is running.

Under normal operation gitfs would run in the background until you unmount
it with "gitfs -u"  *However*, we currently always run in "debug" mode so the
gitfs program runs in the foreground.  To shut down you just have to send
it a ctrl-C and it should shut down cleanly.  For now you should only have
to use "gitfs -u" if something goes wrong and it crashes.

EXAMPLE SESSION
======= =======

  $ gitfs ~/git/linux-2.6 /tmp/fuse

  [then in another window]
  $ cd /tmp/fuse
  $ ls -l
  total 0
  dr-xr-xr-x  2 mitch mitch 0 Apr 20 16:38 HEADS
  dr-xr-xr-x  2 mitch mitch 0 May 24 20:32 TAGS
  $ ls -l TAGS
  total 0
  lrwxrwxrwx  1 mitch mitch 43 May  4 16:51 v2.6.11 -> ../5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
  lrwxrwxrwx  1 mitch mitch 43 May  4 16:51 v2.6.11-tree -> ../5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
  lrwxrwxrwx  1 mitch mitch 43 May  1 17:16 v2.6.12-rc2 -> ../9e734775f7c22d2f89943ad6c745571f1930105f
  lrwxrwxrwx  1 mitch mitch 43 May  1 17:15 v2.6.12-rc3 -> ../0397236d43e48e821cce5bbe6a80a1a56bb7cc3a
  lrwxrwxrwx  1 mitch mitch 43 May  6 22:22 v2.6.12-rc4 -> ../ebb5573ea8beaf000d4833735f3e53acb9af844c
  lrwxrwxrwx  1 mitch mitch 43 May 24 20:32 v2.6.12-rc5 -> ../06f6d9e2f140466eeb41e494e14167f90210f89d
  $ cd TAGS/v2.6.11
  $ ls
  arch     Documentation  init    MAINTAINERS  README          sound
  COPYING  drivers        ipc     Makefile     REPORTING-BUGS  usr
  CREDITS  fs             kernel  mm           scripts
  crypto   include        lib     net          security
  $ pwd
  /tmp/fuse/TAGS/v2.6.11
  $ /bin/pwd
  /var/tmp/fuse/5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
  $ ls -l /tmp/fuse
  total 0
  dr-xr-xr-x  18 mitch mitch 352 May  4 16:50 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
  dr-xr-xr-x   2 mitch mitch   0 Apr 20 16:38 HEADS
  dr-xr-xr-x   2 mitch mitch   0 May 24 20:32 TAGS
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the git-announce mailing list