Re: [PATCH] add the ability to create and retrieve delta objects

From: Chris Mason <mason@suse.com>
Date: 2005-05-05 01:56:18
On Tuesday 03 May 2005 04:06, Nicolas Pitre wrote:
> On Mon, 2 May 2005, Linus Torvalds wrote:
> > If you do something like this, you want such a delta-blob to be named by
> > the sha1 of the result, so that things that refer to it can transparently
> > see either the original blob _or_ the "deltified" one, and will never
> > care.
>
> Yep, that's what I've done last weekend (and just made it actually
> work since people are getting interested).
>

My first run didn't go well, diff_delta generates an invalid delta when passed 
a buffer of length 0.  I really should not have been calling it this way, but 
it should do a quick check and return an error instead of something 
invalid ;)

I did two additional runs, first where I fixed the delta chain length at 1 as 
in the zdelta patch.   In this mode, if it tried to diff against a delta it 
would find the delta's parent and diff against that instead.  Even though 
zdelta had the same speeds for applying patches as xdiff(1), zdelta used 
significantly more cpu (53m vs 40m).

The next run was with the patch I've attached below, it allows chains up to 16 
deltas in length.  
                             git         zdelta       xdiff (1)      xdiff(16)
apply                  150m       117m       117m         104m
checkout             4m30s      3m41      4m43s        7m11s
checkout (hot)     56s           12s         14s             16s
space usage        2.5G         1G           1.2G           800m

The longer delta chains trigger more random io on checkout, negating the speed 
improvements from the packed item patch.  The hot cache times show that xdiff 
isn't using a huge amount of cpu to patch things in, and so there's room for 
smarter packing and regenerating deltas in order to keep checkout times low.  
This patch still doesn't pack commits and trees in with the blob files, and 
it doesn't delta trees, and so I expect better space/speed numbers in later 
revs.

I won't be able to work on this until next week, but here's my plan:

1) update to current git.  My patch is from before the safe file generation 
changes.

2) change update-cache and write-tree so that packing/deltas are off by 
default.  Add --packed and --delta options to both.

3) create a git-pack tool that can pack/unpack existing changesets,trees and 
files, optionally adding/removing deltas.

My current code should preserve the delta object header used by Nicolas, and 
removes all knowledge of deltas from the packed item headers.  This is not 
quite as efficient, but the resulting code is much cleaner.  I haven't tried, 
but it should be able to read a file created by his mkdelta.c.

-chris

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Thu May 05 01:58:52 2005

This archive was generated by hypermail 2.1.8 : 2005-05-05 01:58:52 EST