Re: [PATCH] multi item packed files

From: Chris Mason <mason@suse.com>
Date: 2005-04-26 08:20:57
On Friday 22 April 2005 19:55, Chris Mason wrote:
> On Friday 22 April 2005 16:32, Chris Mason wrote:
> > If I pack every 64k (uncompressed), the checkout-tree time goes down to
> > 3m14s. That's a very big difference considering how stupid my code is 
> > .git was only 20% smaller with 64k chunks.  I should be able to do
> > better...I'll do one more run.
>
> This run also packed tree files together (everything produced by write-tree
> went into a packed file), but not the commits.  I estimate I could save
> about another 168m by packing the tree files and commits into the same file
> with the blobs, but this wouldn't make any of the times below faster.
>
> git - original (28k commits)	                packed
> FS size                2,675,408k		1,723,820k
> read-tree            24.45s			18.9s
> checkout-cache   4m30s			3m5s
> patch time	   2h30m				1h55m
>

It was a rainy weekend, so I took a break from lawn care and hacked in some 
simple changes to the packed file format.  There's now a header listing the 
sha1 for each subfile and the offset where to find it in the main file.  Each 
subfile is compressed individually so you don't have to decompress the whole 
packed file to find one.  commits were added into the packed files as well.

Some results were about what I expected:

FS size              -- 1,614,376k
read-tree          -- 18s
checkout-cache -- 2m35s (cold cache)
checkout-cache -- 18s      (hot cache)
patch time        -- 96m

vanilla git needs 56s to checkout with a hot cache.  The hot cache numbers 
weren't done before because I hadn't expected my patch to help at all.  Even 
though we both do things entirely from cache, vanilla git is much slower at 
writing the checked out files back to the drive.  I've made no optimizations 
to that code, and the drive is only 30% full, so this seems to just be a bad 
interaction with filesystem layout.

I also expected vanilla git to perform pretty well when there were no commits 
in the tree.  My test was to put a copy of 2.6.11 under git.
                                              vanilla                   packed
update-cache (for all files)      2m1s                     48s
checkout-cache (cold)            1m23s                    28s
checkout-cache (hot)             12s                         15s

The difference in hot cache checkout time is userland cpu time.  It could be 
avoided with smarter caching of the packed file header.  Right now I'm 
decompressing it over and over again for each checkout.  Still, the 
performance hit is pretty small because I try to limit the number of subfiles 
that get packed together.

My current patch is attached for reference, it's against a git from late last 
week.  I wouldn't suggest using this for anything other than benchmarking, 
and since I don't think I can get much better numbers easily, I'll stop 
playing around with this for a while.

-chris

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Tue Apr 26 08:23:19 2005

This archive was generated by hypermail 2.1.8 : 2005-04-26 08:23:19 EST