[PATCH] multi item packed files

From: Chris Mason <mason@suse.com>
Date: 2005-04-22 01:13:13
Hello,

There have been a few threads on making git more space efficient, and 
eventually someone mentions tiny files and space fragmentation.  Now that git 
object names are decoupled from their compression, it's easier to consider a 
a variety of compression algorithms.  I whipped up a really silly "pack files 
together" compression.

This would maintain the write once semantics but allow a simple mechanism 
where objects are combined together.  Choosing which objects to combine is 
easy, things put together into update-cache go together.  This gives us more 
space efficiency and no seeks when reading that packed file off disk.

A natural extension to this is to make update-cache --commit-tree, which 
includes the files produced by write-tree and commit-tree into the same 
packed file.  (I haven't coded this).

The layout works like this:

1) a new object type "packed" is added.
2) new objects are buffered into a packed object, until it gets to around 32k 
in size.  This is complete arbitrary but felt about right.
3) The packed object is writting to git storage and then hard links are made 
to the packed object from the sha1 filename of each object inside.
4) read_sha1_file is changed to recognize the packed object and search inside.

I did a simple test on the 2.6.11 tree with my 100 patches applied.  Without 
packing, .git is 99MB.  With packing it only needs 62MB:

read speeds don't suffer with this, time to read-tree ; checkout-cache -a -f 
from a cold cache were the same.  I could get the times lower with the patch 
by caching the uncompressed data, since in theory I should be faster here.

Using this on data you care about would be a really bad idea right now.  I'm 
only posting the patch to get the basic idea across for benchmarking and 
discussion.

-chris

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Fri Apr 22 01:15:32 2005

This archive was generated by hypermail 2.1.8 : 2005-04-22 01:15:32 EST