Re: git-rev-list: "--bisect" flag

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-06-19 15:03:20
On Sat, 18 Jun 2005, Linus Torvalds wrote:
> 
> Now, I say "roughly", because there may not _be_ any commit that exactly 
> bisects the case. For a trivial case, let's say that we know that 'A' is 
> bad, and 'B' is good, but the graph in between them looks like this:

Let's make a different case, just to make it even more obvious.

Let's say that the graph is


        A
       / \
      a1  b1
      |   |
      a2  b2
      |   |
      a3  b3
      |   |
      a4  b4
      |   |
      a5  b5
      |   |
      a6  b6
      |   |
      a7  b7
      |   |
      a8  b8
      |   |
      a9  b9
       \ /
        B

and we know that "A" is bad, and "B" is good, but we don't know which of 
a1-a9/b1-b9 are buggy.

Where do we start testing?

We start testing at either 'a1' or 'b1', because those are the two values 
that bisect the list either into the "a1-a9" series, or the "b1-b9" 
series. Any other starting point would be a bug.

Or, let's say that the graph is

        A
       / \
      |   b1
      |   |
      |   b2
      |   |
      |   b3
      |   |
      a1  b4
      |   |
      a2  b5
      |   |
      a3  b6
      |   |
      |   b7
      |   |
      |   b8
      |   |
      |   b9
       \ /
        B


and in this case the right place to pick is 'b4', because that's the one
that reaches 6 commits (b4-b9), while it leaves 6 commits unreachable
(a1-a3, b1-b3): that's a perfect bisection, and now it's totally 
unambiguous (in the previous case a1 and b1 were equally good choices, in 
this case there is only one valid choice).

It gets more interesting when you have intermediate merges:

        A
       / \
      |   b1
      |   |
      |   b2
      |   |
      |   b3
      |   |
      a1  b4
      |   |
      a2  b5
      | / |
      a3  b6
      |   |
      |   b7
      |   |
      |   b8
      |   |
      |   b9
       \ /
        B

The above graph _looks_ very similar, but now the right place to bisect is
'b5', because that reaches six commits (a3 + b5-b9), and again there are
six commits unreachable (a1-a2 + b1-b4). Again, this is unambiguous -
there is one clearly superior choice.

The current --bisect algorithm may not always pick that best choice: I
think I should subtract one from the total number of commits, since right
now it counts the "good" commit too, ie 'A'. But I think it's at most
off-by-one.

The bigger problem is that the algorithm is something like O(N^3) in cost
- albeit with a fairly cheap constant factor. In other words, it might not
be worthwhile bisecting three years worth of development with it, and you
migth be better off starting with a rougher half-way-point algorithm
("let's try some release a year and a half ago first").

The performance problem seems to really be pretty theoretical: I can
bisect the developemnt from 2.6.12-rc2 to current head in 0.59 seconds, so
it's not like it's horribly slow for something like a couple of months
worth of development.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sun Jun 19 15:01:48 2005

This archive was generated by hypermail 2.1.8 : 2005-06-19 15:01:49 EST