Re: [PATCH/RFC] reverse the pack-objects delta window logic

From: Nicolas Pitre <nico@cam.org>
Date: 2006-04-27 01:48:53
On Tue, 25 Apr 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > Note, this is a RFC particularly to Junio since the resulting pack is 
> > larger than without the patch with git-repack -a -f.  However using a 
> > subsequent git-repack -a brings the pack size down to expected size.  So 
> > I'm not sure I've got everything right.
> 
> I haven't tested it seriously yet, but there is nothing that
> looks obviously wrong that might cause the inflation problem,
> from the cursory look after applying the patch on top of your
> last round.
> 
> > +	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
> > +		return 0;
> 
> The older code was loosening this check only for a delta chain
> that is already in pack (which is limited to its previous
> max_depth).  The end result is almost the same -- a thin pack
> recipient would have deeper delta than it asked. The difference
> is that the earlier code had implicit 2*max_depth limit,

Ah.  Indeed.  Didn't realize that.  I can restore that behavior quite 
easily if necessary.

> but this one makes the chain length unbounded, which I do not think it 
> is necessarily a bad change.

Well as long as the thin pack doesn't carry too many revisions it should 
be fine since, as the comment in the code sais, those packs are always 
unpacked.

Initially I had a bug where the delta depth was completely ignored.  I 
was pretty excited when repacking the kernel produced a pack 20% smaller 
although I didn't know why at that time.  But when attempting another 
git-repack -a -f then the initial object counting was sooooooo 
slooooooooow.

> > -	/*
> > -	 * NOTE!
> > -	 *
> > -	 * We always delta from the bigger to the smaller, since that's
> > -	 * more space-efficient (deletes don't have to say _what_ they
> > -	 * delete).
> > -	 */
> 
> This comment by Linus still applies, even though the scan order
> is now reversed; no need to remove it.

This is not exactly true.  In general it is so, but as we fixed the 
deltification of objects with the same name but in different directories 
it is well possible to go from smaller to larger and leaving that 
comment there is misleading.

This is also why I changed the sizediff rule such that:

	sizediff = src_size < size ? size - src_size : 0;

Since the src buffer already has its delta index computed, it costs 
almost nothing to attempt matching much smaller objects against it.  
However if we go from small to larger then the previous logic still 
applies.

> > +	if (trg_entry->delta) {
> > +		/*
> > +		 * The target object already has a delta base but we just
> > +		 * found a better one.  Remove it from its former base
> > +		 * childhood and redetermine the base delta_limit (if used).
> > +		 */
> 
> And you are making the delta chain unbound for thin case, you
> can probably omit this with the same if() here; the
> recomputation seems rather expensive.

Ah right.  I was doing so partly, but I can skip any tree maintenance 
altogether in that case as well.

> > +		if (!size)
> > +			continue;
> > +		delta_index = create_delta_index(n->data, size);
> > +		if (!delta_index)
> > +			die("out of memory");
> 
> It might be worth saying "if (size < 50)" here as well; no point
> wasting the delta window for small sources.

Good point.  No real effect on the pack size though.

> > -#if 0
> > -		/* if we made n a delta, and if n is already at max
> > -		 * depth, leaving it in the window is pointless.  we
> > -		 * should evict it first.
> > -		 * ... in theory only; somehow this makes things worse.
> > -		 */
> > -		if (entry->delta && depth <= entry->depth)
> > -			continue;
> > -#endif
> 
> I was almost tempted to suggest that the degradation you are
> seeing might be related to this mystery I did not get around to
> solve.  By allowing to give chance to try delta against less
> optimum candidates, it appeared that we ended up making the
> final pack size bigger than otherwise, which suggests that our
> choice between plain undeltified and a delta half its size might
> be favoring delta too much.  But it does not appear to be
> related to the inflation you are seeing.

Certainly not, since git-repack -a may only delta _more_ and the pack 
size actualy goes down a lot in my case.

The mystery I'm facing is why would a second pass with git-repack -a fix 
things?  It has a different window behavior since objects already 
deltified do not occupy window space. Hmmm.  That would certainly 
explain why doing a git-repack -a after a git-repack -a -f produces a 
smaller pack even currently.

> BTW, have you tried it without --no-reuse-pack on an object list
> that is not thin?  It appears you are busting the depth limit.
> 
> Using the same "git rev-list --objects v1.2.3..v1.3.0" as input,
> git-pack-objects without --no-reuse-pack gives this
> distribution:
> 
> chain length = 1: 364 objects
> chain length = 2: 269 objects
> chain length = 3: 198 objects
> chain length = 4: 164 objects
> chain length = 5: 148 objects
> chain length = 6: 123 objects
> chain length = 7: 122 objects
> chain length = 8: 103 objects
> chain length = 9: 92 objects
> chain length = 10: 234 objects
> chain length = 11: 12 objects
> chain length = 12: 1 object
> chain length = 13: 2 objects

Oops.  OK fixed.

> So it _might_ be that the depth limiting code is subtly broken
> which is causing you throw away a perfectly good delta base
> which in turn results in a bad pack.

Actually no.  That bug instead allowed each given base to deltify more 
targets than it should have.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Apr 27 01:49:36 2006

This archive was generated by hypermail 2.1.8 : 2006-04-27 01:49:57 EST