Re: Unexpected behavior in git-rev-list

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-09-22 03:40:42
On Wed, 21 Sep 2005, Peter Eriksen wrote:
> 
> Ok, I have a prototype.  The algorithm has three steps:
> 
> 1) traverse the commit DAG in breadth first order

The thing is, this is _expensive_.

It's very possible to cut down a lot of the costs by having logic to cut 
down the expense of looking at the whole commit dag.

In particular, almost all merges will have the same object in _one_ of the
parents as in the result. And if you just make the rule be that you only 
follow the first parent that matches the result in the merge, you'll 
almost always end up with a nice linear thing, with no need to look at 
multiple parents at all.

Of course, sometimes the merge actually _does_ merge changes from both 
(or more) sides of a commit, and then you need to follow them down and it 
gets nasty and complicated.

Anyway, I've seriously considered adding a mode to "git-rev-list" that
automatically avoids following the parents that aren't relevant for a
certain set of files.

Ie if you did

	git-rev-list rev1 rev2 ^rev3 ^rev4 .. pathname

it would only show the revisions that actually _change_ the pathname.

It's not entirely trivial. The biggest bummer is that we'd have to fake
out the parent info (ie the "parent" would have to be the previous entry
that changes it, not the real one).

I'm convinced that it's all quite possible, though, by just rewriting the
"commit->parents" list (remove parents that don't change the set of files,
and in merges where one parent has zero diffs for that set, just select
_that_ parent, and then continue to prune).

It might be best not being done by git-rev-list, but by a specialized 
program. However, the advantage of doing it in git-rev-list is that then 
things like "git log" and "gitk" would automatically take advantage of it, 
ie you could say

	gitk v2.6.12.. drivers/char/

and it would show a "cut-down" revision tree that only contained the stuff 
that changed anything in drivers/char/.

This would be (a) very useful (b) very powerful and (c) should even be
pretty efficient. Sure, systems that natively do things in a file-specific
way are still a lot more efficient on a single-file basis, but the git
architecture actually lends itself very well to the above kind of "track a
whole subdirectory" (or "track two subdirectories and one filename", or
anything like that).

And a much more efficient "annotate" would fall out automatically out of 
it (although I really think that the "gitk v2.6.12.. drivers/char/" is 
what would be a lot more useful than annotate has ever been).

We already have this in "git-whatchanged", which I personally find very 
very powerful. But we could do it even better.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Sep 22 03:41:29 2005

This archive was generated by hypermail 2.1.8 : 2005-09-22 03:41:32 EST