Re: new utility for decoding salinfo records

From: Russ Anderson <rja_at_efs.americas.sgi.com>
Date: 2005-01-12 08:22:17
David Mosberger wrote:
> 
> Yes.  While individual single-bit errors aren't terribly interesting,
> periodic summaries almost certainly would be.  If only so you know
> when to order replacement DIMMs... ;-)

The only reason customers care about single bits (a recovered error)
is out of fear that they will soon lead to a multi-bit error (that
is not recoverable) that crashes the system.  If the system recovers 
from multi-bits without crashing, either by killing the app
that hit the multi-bit or (better) by backing up to the last 
checkpoint (losing processing time, but not data), then the 
customer won't even care about single bits.

Then the answer is you order the replacement DIMMs after they fail.  :-)

Or maybe not even then.  Hard drives have flaw tables that indicate
the parts of the disks to avoid.  If memory DIMMs had flaw tables,
and the equivilent of badblocks, why would you replace a DIMM?

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Jan 11 17:38:54 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:34 EST