Re: Why so much time in the kernel?

From: Jon Smirl <jonsmirl@gmail.com>
Date: 2006-06-17 03:44:27
On 6/16/06, Keith Packard <keithp@keithp.com> wrote:
> On Fri, 2006-06-16 at 13:00 -0400, Jon Smirl wrote:
> > Is it a crazy idea to read the cvs files, compute an sha1 on each
> > expanded delta and then write the delta straight into a pack file? Are
> > the cvs and git delta formats the same? What about CVS's forward and
> > reverse delta use?
>
> At this point, merging blobs into packs isn't a significant part of the
> computational cost. parsecvs is spending all of its time in the
> quadratic traversal of the diff chains; fixing that to emit all of the
> versions in a single pass should speed up that part of the conversion
> process dramatically.

That's not true for the state I am in. cvsps can compute the changeset
tree in 15 minutes, cvs2svn can compute their version in a couple of
hours. cvs2svn builds a much better tree.

I've been extracting versions from cvs and adding them to git now for
2.5 days and the process still isn't finished. It is completely CPU
bound. It's just a loop of cvs co, add it to git, make tree, commit,
etc.

> >  While this is going on, track the
> > branches/changsets in memory and then finish up by writing these trees
> > into the pack file too. This should take no more ram than cvsps needs
> > currently.
>
> cvsps drops too much state on the floor making branch point and branch
> contents inaccurate. What I'm hoping is that I can figure out a way to
> discard most of the per-version information by computing tree objects in
> reverse order, saving only the tree sha1 and other per-commit info, then
> stitch the commits together using that, without needing the full
> per-file data.

I agree cvsps is dropping a lot.  My screen is full of "Skipping
#CVSPS_NO_BRANCH" and
"Skipping SpiderMonkey140_NES40Rtm_Branch" and "Skipping
SpiderMonkey140_BRANCH" etc.

What about the cvs2svn algorithm described in the attachment? A ram
based version could be faster. Compression could be acheived by
switching from using the full path to a version to the sha1 for it.

-- 
Jon Smirl
jonsmirl@gmail.com

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Sat Jun 17 03:45:11 2006

This archive was generated by hypermail 2.1.8 : 2006-06-17 03:45:37 EST