On Sun, 19 Jun 2005, Jon Seymour wrote: > > Perhaps in answering this question, you can help me understand the > intended behaviour of your bisection algorithm a little better. The > question is this: for your purposes, given a rev-list output, why not > simply use the literal middle element of the outputed list? The literal middle tends to be in the middle _timewise_, but that's irrelevant. It might be just a single step away from one of the commits that has already been marked "good", and as such, testing that one might well be totally pointless: if you have a thousand commits (not at all unlikely), testing whether they are good or not one at a time is just not reasonable. What we want to get is the commit that basically bisects the other commits from a "reachability graph" angle. We want something that when selected as the new "top" will have roughly half the commits in it. Put another way: let's say that we have those thousand commits that _may_ have introduced a bug, but we don't know which one. What do we want? We want to find a new head that contains roughly five hundred of the unknown commits. No more, no less. Maybe we won't find a 500/500 split, but we definitely want a 600/400 split over a 1/999 split, since the 1/999 split isn't likely to get us any closer to the answer. Now, I say "roughly", because there may not _be_ any commit that exactly bisects the case. For a trivial case, let's say that we know that 'A' is bad, and 'B' is good, but the graph in between them looks like this: A / | a | | b | | \ c d e \ | / B so the bug might have been introduced by any of a-e, and we just don't know which one was the cause. Now, selecting either 'a' or 'b' as the new tip is a good choice, 'a' reaches two unknowns (a+c) while b reaches 3 unknowns (b+d+e) but the others only reach a single unknown. And notice how the two good bisection places were both _recent_. They were not in the middle "time-wise". Also note that we do NOT want to reach the maximal amount of commits: we want to reach _half_ the commits. If we select a commit that reaches a lot of other unknown commits ('A' is the maximal here, of course), then we're not actually bisecting the thing at all: we're just testing close to a "known bad" case. But let's say that 'c' didn't actually exist (or, alternatively, we have tested that one and know that "B+c" is good, which basically makes 'c' irrelevant for bug-hunting): in that case there is no good bisection, since whichever of the remaining ones we choose (a,b,d or e) we'll always have split it up so that we basically test the addition or subtraction of just one single commit. There's not anything "--bisect" can do about that, and it will just happen to pick one of them. And that's what I mean by "there may not be any commit that is nicely in the middle". Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Sun Jun 19 13:41:20 2005
This archive was generated by hypermail 2.1.8 : 2005-06-19 13:41:21 EST