Re: Creating objects manually and repack

From: Shawn Pearce <spearce@spearce.org>
Date: 2006-08-05 15:46:55
Shawn Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 8/5/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> > >On 8/5/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > >> On 8/4/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > >> > and you're basically all done. The above would turn each *,v file into 
> > >a
> > >> > *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many
> > >> > pack-files as you have *,v files.
> > >>
> > >> I'll end up with 110,000 pack files.
> > >
> > >Then just do it every 100 files, and you'll only have 1,100 pack
> > >files, and it'll be fine.
> > 
> > This is something that has to be tuned. If you wait too long
> > everything spills out of RAM and you go totally IO bound for days. If
> > you do it too often you end up with too many packs and it takes a day
> > to repack them.
> > 
> > If I had a way to pipe the all of the objects into repack one at a
> > time without repack doing multiple passes none of this tuning would be
> > necessary. In this model the standalone objects never get created in
> > the first place. The fastest IO is IO that has been eliminated.
> 
> I'm almost done with what I'm calling `git-fast-import`.

OK, now I'm done.  I'm attaching the code.  Toss it into the Makefile
as git-fast-import and recompile.

I tested it with the following Perl script, feeding the Perl script
a list of files that I wanted blobs for on STDIN:

	while (<>) {
		chop;
		print pack('L', -s $_);
		open(F, $_);
		my $buf;
		print $buf while read(F,$buf,128*1024) > 0;
		close F;
	}

This gave me an execution order of:

	find . -name '*.c' | perl test.pl | git-fast-import in.pack
	git-index-pack in.pack

at which point in.pack claims to be a completely valid pack with an
index of in.idx.  Move these into .git/objects/pack, generate trees
and commits, and run git-repack -a -d.  If the order you feed the
objects to git-fast-import in is reasonable (do one RCS file at a
time, feed most recent to least recent revisions) you may not get
any major benefit from using -f during your final repack.

The code for git-fast-import could probably be tweaked to accept
trees and commits too, which would permit you to stream the entire
CVS repository into a single pack file.  :-)

I can't help you decompress the RCS files faster, but hopefully
this will help you generate the GIT pack faster.  Hopefully you
can make use of it!

-- 
Shawn.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Sat Aug 05 15:47:45 2006

This archive was generated by hypermail 2.1.8 : 2006-08-05 15:48:16 EST