Re: git-rev-list: "--bisect" flag

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-06-19 13:43:04
On Sun, 19 Jun 2005, Jon Seymour wrote:
> 
> Perhaps in answering this question, you can help me understand the
> intended behaviour of your bisection algorithm a little better. The
> question is this: for your purposes, given a rev-list output, why not
> simply use the literal middle element of the outputed list?

The literal middle tends to be in the middle _timewise_, but that's
irrelevant. It might be just a single step away from one of the commits
that has already been marked "good", and as such, testing that one might
well be totally pointless: if you have a thousand commits (not at all
unlikely), testing whether they are good or not one at a time is just not
reasonable.

What we want to get is the commit that basically bisects the other commits 
from a "reachability graph" angle. We want something that when selected as 
the new "top" will have roughly half the commits in it.

Put another way: let's say that we have those thousand commits that _may_ 
have introduced a bug, but we don't know which one. What do we want? We 
want to find a new head that contains roughly five hundred of the unknown 
commits. No more, no less. Maybe we won't find a  500/500 split, but we 
definitely want a 600/400 split over a 1/999 split, since the 1/999 split 
isn't likely to get us any closer to the answer.

Now, I say "roughly", because there may not _be_ any commit that exactly 
bisects the case. For a trivial case, let's say that we know that 'A' is 
bad, and 'B' is good, but the graph in between them looks like this:


                   A
                 / |
                a  | 
                |  b
                |  | \
                c  d  e
                 \ | /
                   B

so the bug might have been introduced by any of a-e, and we just don't
know which one was the cause.

Now, selecting either 'a' or 'b' as the new tip is a good choice, 'a'
reaches two unknowns (a+c) while b reaches 3 unknowns (b+d+e) but the
others only reach a single unknown.

And notice how the two good bisection places were both _recent_. They were 
not in the middle "time-wise".

Also note that we do NOT want to reach the maximal amount of commits: we 
want to reach _half_ the commits. If we select a commit that reaches a lot 
of other unknown commits ('A' is the maximal here, of course), then we're 
not actually bisecting the thing at all: we're just testing close to a 
"known bad" case.

But let's say that 'c' didn't actually exist (or, alternatively, we have
tested that one and know that "B+c" is good, which basically makes 'c'
irrelevant for bug-hunting): in that case there is no good bisection,
since whichever of the remaining ones we choose (a,b,d or e) we'll always
have split it up so that we basically test the addition or subtraction of
just one single commit. There's not anything "--bisect" can do about that, 
and it will just happen to pick one of them. And that's what I mean
by "there may not be any commit that is nicely in the middle".

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sun Jun 19 13:41:20 2005

This archive was generated by hypermail 2.1.8 : 2005-06-19 13:41:21 EST