Computing delta sizes in pack files

From: Shawn Pearce <spearce@spearce.org>
Date: 2006-11-21 16:39:42
Recently I wanted to know how well Git's pack files were doing at
storing rather large JAR files.  So I wrote the attached script to
parse the output of `git verify-pack -v` and use that to determine
how many bytes are needed for each revision of any given file.

For example running it on builtin-blame.c:

  $ perl ../delta-sizes.pl builtin-blame.c 
  Caching cache-cdc41646a9de201b06a936fc3bddcbd51aeb532c.v...
  Pack index cache created.
  
  builtin-blame.c
  16660221... s 2         44
  066dee74... s 1         62
  176f51a4...   0      12797
  ----------------------------------------
           3 revs         12 KiB
  
           3 revs         12 KiB

There are 3 revisions of this file, totalling 12 KiB in disk space
within the pack files.  One of those revisions uses 44 bytes and the
other uses 62 bytes.  Given that this includes the complete overhead
(including the 20 byte OBJ_REF_DELTA header) we're talking about
~20 bytes of delta data in revision 16660221.  Pretty good.  :)

Of course this only looks at a single blob object and does not take
into account the tree and commit overheads for a given revision,
but it does give a really good idea of what is going on.

-- 
Shawn.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Tue Nov 21 16:40:26 2006

This archive was generated by hypermail 2.1.8 : 2006-11-21 16:41:40 EST