Re: [PATCH] New way of storing MCA/INIT logs

From: Robin Holt <holt_at_sgi.com>
Date: 2008-03-12 01:32:47
On Tue, Mar 11, 2008 at 03:07:20PM +0100, Zoltan Menyhart wrote:
> Let me ask again: do you expect _independent_ MCAs to happen?
> If you have got a estimation of the probability of independent
> MCAs happening at a same time, different from what I calculated,
> then please share it with us.
>
> If the MCAs are the consequences of the same error event, then
> you can find out what they are, where they are from 2 or 3 logs.
>
> The code actual tries to recover local MCAs only. They are:
> - TLB errors: per CPU local. As the CPUs are much more reliable
>  then the other components, e.g. the memory, having two or
>  more CPUs with corrupted TLBs at the same time is really unlikely.
> - I/O or memory read errors:
>  + One error has affected N CPUs: the first log is enough.
>  + More than one independent error at the same time: assuming
>    my estimations are more or less correct...

I don't know enough in this area to be of much use, but I do recall
times where a customer machine has run into an error and the neither the
first nor last record was of any use, but one of the intermediate
records.  I recall taking nearly a day to find the critical difference
and I vaguely recall it was on the order of 120 records and the useful
record was in the early 80s.  Russ certainly has more experience in this
area.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Mar 12 01:33:05 2008

This archive was generated by hypermail 2.1.8 : 2008-03-12 01:33:20 EST