Re: Compression and dictionaries

From: Jon Smirl <jonsmirl@gmail.com>
Date: 2006-08-15 04:48:56
On 8/14/06, David Lang <dlang@digitalinsight.com> wrote:
> On Mon, 14 Aug 2006, Jon Smirl wrote:
>
> > On 8/14/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >> I still think that this is important to think through: Is it worth a
> >> couple of kilobytes (I doubt that it would be as much as 1MB in _total_),
> >> and be on the unsafe side?
> >
> > The only "unsafe" aspect I see to this is if the global dictionary
> > doesn't contain any of the words in the documents being encoded. In
> > that case the global dictionary will occupy the short huffman keys
> > forcing longer internal keys.  The keys for the words in the document
> > would be longer by a about a bit on average.
>
> the other factor that was mentioned was that a single-bit corruption in the
> dictionary would make the entire pack file useless. if this is really a concern
> then just store multiple copies of the dictionary. on a pack with lots of files
> in it it can still be a significant win.

Bit errors can mess the pack up in lots of ways. If it hits a commit
you won't be able to follow the tree back in time. Packs were never
designed to be error tolerant.

-- 
Jon Smirl
jonsmirl@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Aug 15 04:49:36 2006

This archive was generated by hypermail 2.1.8 : 2006-08-15 04:50:22 EST