Re: irc usage..

From: Linus Torvalds <>
Date: 2006-05-23 06:11:18
On Tue, 23 May 2006, Martin Langhoff wrote:
> The dev machine where I am running the import is a slug! It's still
> working on it, only gotten to 7700 commits, with the cvsimport process
> stable at 28MB RAM and cvs stable at 4MB.

I have to say, that cvsimport script really does do horrible things. It's 
basically a fork/exec/exit benchmark, as far as I can tell. Running 
oprofile on the thing, the top offenders are (ignore the 45% idle thing: 
it's just because this was run on a dual-cpu system, so since it's almost 
completely single-threaded you get ~50% idle by default).

	3117654  45.8708  vmlinux                  vmlinux                  .power4_idle
	802313   11.8046  vmlinux                  vmlinux                  .unmap_vmas
	632913    9.3122  vmlinux                  vmlinux                  .copy_page_range
	150359    2.2123  vmlinux                  vmlinux                  .release_pages
	131330    1.9323  vmlinux                  vmlinux                  .vm_normal_page
	117836    1.7337                    (no symbols)
	74098     1.0902            (no symbols)
	54680     0.8045  vmlinux                  vmlinux                  .free_pages_and_swap_cache
	54300     0.7989                        (no symbols)
	49052     0.7217  vmlinux                  vmlinux                  .copy_4K_page
	46559     0.6850                  getc
	42677     0.6279  vmlinux                  vmlinux                  .page_remove_rmap
	41133     0.6052                  ferror

those kernel functions are all about process create/exit, and COW faulting 
after the fork.

Now, this is on ppc, so process creation is likely slower (idiotic PPC VM 
page table hashes), but Linux is actually very good at doing this, and the 
fact that process create/exit is so high is a very big sign that the 
script just ends up executing a _ton_ of small simple processes that do 
almost nothing.

I wonder why those "git-update-index" calls seem to be (assuming I read 
the perl correctly) done only a few files at a time. We can do a hundreds 
in one go, but it seems to want to do just ten files or something at the 
same time. Although since most commits should hopefully just modify a 
couple of files, that probably isn't a big deal.

That thing would probably be an order of magnitude faster if written to 
use the git library interfaces directly. Of course, the CVS part is 
probably a big overhead, so it might not help much (I would not be 
surprised at all if a number of the fork/exec/exit things are due to the 
CVS server starting RCS or something, not due to git-cvsimport itself)

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at
Received on Tue May 23 06:12:19 2006

This archive was generated by hypermail 2.1.8 : 2006-05-23 06:12:39 EST