Re: CAREFUL! No more delta object support!

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-06-28 13:30:22
On Mon, 27 Jun 2005, Christopher Li wrote:
> On Mon, Jun 27, 2005 at 06:14:40PM -0700, Linus Torvalds wrote:
> > 
> > The reason? The new git understands packed files natively, which ends up 
> > being a much bigger win in many many ways.
> 
> Interesting. I take a look at your change, it still support delta object
> inside the pack file right? For a second I am wondering you drop the delta
> feature completely.

Deltas do exist inside pack-files, yes. They just don't exist as 
independent objects any more, so you can never get into the situation that 
you find a delta but you don't find the delta it points to.

Because in the pack-files, there are only deltas _within_ a pack-file. You 
can't have a delta that points to outside the pack.

This means that pack-files with few objects will inevitably be larger than
they could otherwise be (ie you can never have a pack file that _only_
contains deltas to the outside world), but it's just incredibly reassuring 
to me that a pack-file is always self-sufficient. 

So when/if we start using pack-files for doing "git pull" etc, the 
pack-file won't actually help pack things for small updates: small updates 
will probably contain the whole changed file, unless the update has 
several changes to the same file (which is not unusual, of course), in 
which case it will only contain one version and then deltas from that.

But the savings get increasingly bigger the more history we have. That's
also why the packed git archive is about 1/14th of the size of the fully
unpacked disk usage of the git project, but a packed kernel archive "only"  
achieves a packing rate of 1/5th of the fully unpacked kernel archive. The
git archive is all history, while the kernel archive just "appears", and
2/3 of the files have only one single version and thus don't delta-
compress at all.

(Another reason is probably that the kernel has bigger files, which means
that it thus has relatively less loss in filesystem block padding).

But not having any outside deltas not only makes me feel safer, it also
means that you can fully validate a pack archive consistency without even
knowing what project it is from - you can check the SHA1 results of every
file in the pack against the index of the pack, and check that the SHA1's
of the pack files themselves are valid. Again, this is just a data
_consistency_ check, of course - it means that you can validate that it
downloaded fine, and that you don't have disk corruption, but it doesn't
mean that the data isn't evil and nasty and buggy ;)

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Jun 28 13:28:46 2005

This archive was generated by hypermail 2.1.8 : 2005-06-28 13:28:50 EST