Re: [PATCH] Add '--create-index' to git-unpack-objects

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-10-13 01:20:04
On Wed, 12 Oct 2005, Sergey Vlasov wrote:
> 
> Hmm, pack-objects.c:write_one() does exactly the opposite - it writes
> the base object _after_ writing out the delta (but it does not ensure
> that ordering completely, so references to base objects can be
> pointing in both directions).  Why?

pack-objects.c is actually going to some trouble to make sure that the 
resulting pack is "optimal" in layout for the most recent case.

Not that I have actually verified optimality, but it was _meant_ to be 
that way. And my limited tests seemed to agree.

So it writes out all objects in "recency order", which is the order it 
gets them from git-rev-list: it's the same order as the objects are 
discovered when we traverse the history in time (except all commits come 
first, since most operations will traverse the commit history more than 
they will traverse the rest of the object links).

So the objects that are reachable in the most recent tree are all supposed 
to be at the beginning of the pack-file, just after the commits.

Now, think about what happens if such an object is a delta against 
something else...

In other words, if the most recent tree contains a delta against a much 
older object, we want not only the _delta_ to be early in the pack-file, 
we want the object that it is a delta _against_ to be there too (just 
_after_ the delta, to be exact: we obviously read the delta first, so it 
should come first in the pack).

The point being, that if you unpack the latest tree (ie "git checkout" or 
any of the normal "git diff" behaviour), the pack-file will basically be 
walked in a dense manner, and linearly starting roughly from the 
beginning. Which is the optimal IO pattern. Dense and ascending reads.

Now, if the object is reachable through some recent branch, but the delta 
is not, then that is not true. In that case, you want to write the recent 
base object early in the pack-file, but you do _not_ want to write the 
delta together with it, because that would be the wrong thing for the 
"recent head" case: it would add stuff to the beginning of the pack-file 
that isn't needed for recent objects.

So that's why it's an assymmetric thing. The preferred ordering of time 
breaks the symmetry.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Oct 13 01:21:14 2005

This archive was generated by hypermail 2.1.8 : 2005-10-13 01:21:17 EST