GITFS pre-release version 0.01 gitfs is a FUSE-based filesystem for working with source trees stored in git repositories. Currently only very basic functionality is implemented but I'm hoping to expand it into a useful tool for managing many builds and patches. OVERALL PLAN ======= ==== Many years ago I had an idea for a filesystem that would make importing patches and spinning kernel builds far more efficient. The basic idea was: 1. Store the unchanged part of the source tree in a shared repository; only keep a separate copy of the files and directories that have changed. This would be similar to doing hardlinked source tree copies but be even faster -- checking out a new tree would be as quick as mkdir'ing a new empty directory. 2. On top of this, implement a very-fast "diff" operator that only worked on changed files 3. Have ccache-style compiler caching built in. The filesystem could (using some wrapper programs) watch every file read and written to by a command like "gcc". Since it knows what versions of those files were read at that time it can know very quickly if any of them changed. This saves the "gcc -E" step that ccache must do to determine that it can use the cached .o result and should be quite a bit faster. So basically common operations such as "compile a test kernel with this fix" and "produce a well-formed patch describing how my current tree diverges from mainline" become very fast. Several times I started to implement this idea but every time I got bogged down in the details of making kernel parts of the file system and such work. However, two things changed recently: 1. git came along; I realized that I could use it for the backing data store. Since the linux kernel is already published in git format this is especially handy. Leveraging the existing git code and design has sped this up immeasurably. 2. FUSE (filesystem in userspace) has become more widely available -- it hasn't make it to mainline yet but it is in the -mm series kernels. This made getting started on actual implementation a lot easier. Writing a userspace filesystem on top of FUSE is really a joy. I'm currently calling this project "gitfs" although perhaps that is a bit of a misnomer since I am _absolutely_not_ trying to implement the full SCM workflow as a filesystem. In fact we present hardly any git metadata like commit messages at all. Also, I operate on the underlying objects directly - the index file is never touched. However, I decided to stick with the "gitfs" name for now -- I'm hoping that this project can grow to become a useful compliment to the git workflow. I'm not adverse to giving it a different name if it's an issue, though. CURRENT STATE ======= ===== This is an early pre-release that only demonstrates the most basic of functionality -- read-only access to the existing tags and objects in the git repository. Still, it's already a somewhat handy tool which is why I'm announcing it now. In addition to the missing functionality, currently there is a lot of performance work to do -- I've been working on getting it functionally correct first. Specific performance work I'm planing includes: 1. Every time we touch a directory (whether lookup or readdir) we parse the git tree object into a memory structure which then immediately gets thrown away. There is some infrastructure for caching these objects in memory which will solve the problem, but it's not completed yet. 2. On a related note, I need to do some data structure work -- in some places I'm using simple linked lists where I really should be using B-tree's or something. I actually have a lot of this work done already but I need to do some heavy testing before I integrate that into my tree. 3. Since we cache the uncompressed file data our read/write operations always go straight through to the underlying files. A large performance boost would be available if at open()-time we could tell the kernel "here's the file descriptor I opened for you, do I/O to that" That way we could avoid the need for all data to make two user/kernel transitions. However, this would require some extensive work to FUSE to implement. 4. We are currently single threaded; I eventually am planning on adding service threads for handling CPU bound tasks. I want to keep the normal filesystem operations single-threaded (they're generally just walking in-memory structures so they're fast anyway), but things like uncompressing a git object should really be done in separate thread so they won't block other filesystem operations. Finally, since I'm still working on finishing the infrastructure work, please just consider this a "preview release" Feel free to play with it, look at the code, poke it with sticks, etc. However, the code base is still rapidly evolving so I probably won't be able to integrate any non-trivial patches yet. The code also needs things like more comments and clear error messages. BUILDING GITFS ======== ===== Gitfs can currently be obtained at: http://www.sfgoth.com/~mitch/linux/gitfs/ Please refer to the included INSTALL file for directions on compiling the gitfs binary. RUNNING GITFS ======= ===== MOUNT: gitfs [-d] [-O object_cache_dir] <gitdir> <mntpoint> UMOUNT: gitfs -u [-d] <dir> Options: -d -- debugging mode; we run in the foreground and print very verbose messages about what is going on (mostly courtesy of FUSE) -O -- specify an object cache directory. For fast performance we always store the result of decompressing a git "blob" object in a file. This directory is where the decompressed objects live. This currently defaults to "/tmp/gitfs/ocache" DO NOT make this the same as your ".git/objects" directory or things will probably become horribly broken! Currently gitfs never removes anything from the ocache so it can grow quite large. However it's safe to prune files from it (or even blow away the entire tree) while gitfs is running. Under normal operation gitfs would run in the background until you unmount it with "gitfs -u" *However*, we currently always run in "debug" mode so the gitfs program runs in the foreground. To shut down you just have to send it a ctrl-C and it should shut down cleanly. For now you should only have to use "gitfs -u" if something goes wrong and it crashes. EXAMPLE SESSION ======= ======= $ gitfs ~/git/linux-2.6 /tmp/fuse [then in another window] $ cd /tmp/fuse $ ls -l total 0 dr-xr-xr-x 2 mitch mitch 0 Apr 20 16:38 HEADS dr-xr-xr-x 2 mitch mitch 0 May 24 20:32 TAGS $ ls -l TAGS total 0 lrwxrwxrwx 1 mitch mitch 43 May 4 16:51 v2.6.11 -> ../5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c lrwxrwxrwx 1 mitch mitch 43 May 4 16:51 v2.6.11-tree -> ../5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c lrwxrwxrwx 1 mitch mitch 43 May 1 17:16 v2.6.12-rc2 -> ../9e734775f7c22d2f89943ad6c745571f1930105f lrwxrwxrwx 1 mitch mitch 43 May 1 17:15 v2.6.12-rc3 -> ../0397236d43e48e821cce5bbe6a80a1a56bb7cc3a lrwxrwxrwx 1 mitch mitch 43 May 6 22:22 v2.6.12-rc4 -> ../ebb5573ea8beaf000d4833735f3e53acb9af844c lrwxrwxrwx 1 mitch mitch 43 May 24 20:32 v2.6.12-rc5 -> ../06f6d9e2f140466eeb41e494e14167f90210f89d $ cd TAGS/v2.6.11 $ ls arch Documentation init MAINTAINERS README sound COPYING drivers ipc Makefile REPORTING-BUGS usr CREDITS fs kernel mm scripts crypto include lib net security $ pwd /tmp/fuse/TAGS/v2.6.11 $ /bin/pwd /var/tmp/fuse/5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c $ ls -l /tmp/fuse total 0 dr-xr-xr-x 18 mitch mitch 352 May 4 16:50 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c dr-xr-xr-x 2 mitch mitch 0 Apr 20 16:38 HEADS dr-xr-xr-x 2 mitch mitch 0 May 24 20:32 TAGS - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Sun Jun 26 14:23:00 2005
This archive was generated by hypermail 2.1.8 : 2005-06-26 14:23:03 EST