Shawn Pearce <spearce@spearce.org> wrote: > Jon Smirl <jonsmirl@gmail.com> wrote: > > On 8/5/06, Martin Langhoff <martin.langhoff@gmail.com> wrote: > > >On 8/5/06, Jon Smirl <jonsmirl@gmail.com> wrote: > > >> On 8/4/06, Linus Torvalds <torvalds@osdl.org> wrote: > > >> > and you're basically all done. The above would turn each *,v file into > > >a > > >> > *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many > > >> > pack-files as you have *,v files. > > >> > > >> I'll end up with 110,000 pack files. > > > > > >Then just do it every 100 files, and you'll only have 1,100 pack > > >files, and it'll be fine. > > > > This is something that has to be tuned. If you wait too long > > everything spills out of RAM and you go totally IO bound for days. If > > you do it too often you end up with too many packs and it takes a day > > to repack them. > > > > If I had a way to pipe the all of the objects into repack one at a > > time without repack doing multiple passes none of this tuning would be > > necessary. In this model the standalone objects never get created in > > the first place. The fastest IO is IO that has been eliminated. > > I'm almost done with what I'm calling `git-fast-import`. OK, now I'm done. I'm attaching the code. Toss it into the Makefile as git-fast-import and recompile. I tested it with the following Perl script, feeding the Perl script a list of files that I wanted blobs for on STDIN: while (<>) { chop; print pack('L', -s $_); open(F, $_); my $buf; print $buf while read(F,$buf,128*1024) > 0; close F; } This gave me an execution order of: find . -name '*.c' | perl test.pl | git-fast-import in.pack git-index-pack in.pack at which point in.pack claims to be a completely valid pack with an index of in.idx. Move these into .git/objects/pack, generate trees and commits, and run git-repack -a -d. If the order you feed the objects to git-fast-import in is reasonable (do one RCS file at a time, feed most recent to least recent revisions) you may not get any major benefit from using -f during your final repack. The code for git-fast-import could probably be tweaked to accept trees and commits too, which would permit you to stream the entire CVS repository into a single pack file. :-) I can't help you decompress the RCS files faster, but hopefully this will help you generate the GIT pack faster. Hopefully you can make use of it! -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This archive was generated by hypermail 2.1.8 : 2006-08-05 15:48:16 EST