Date: 2005-01-12 08:22:17
David Mosberger wrote:
> Yes.  While individual single-bit errors aren't terribly interesting,
> periodic summaries almost certainly would be.  If only so you know
> when to order replacement DIMMs... ;-)

The only reason customers care about single bits (a recovered error)
is out of fear that they will soon lead to a multi-bit error (that
is not recoverable) that crashes the system.  If the system recovers 
from multi-bits without crashing, either by killing the app
that hit the multi-bit or (better) by backing up to the last 
checkpoint (losing processing time, but not data), then the 
customer won't even care about single bits.

Then the answer is you order the replacement DIMMs after they fail.  :-)

Or maybe not even then.  Hard drives have flaw tables that indicate
the parts of the disks to avoid.  If memory DIMMs had flaw tables,
and the equivilent of badblocks, why would you replace a DIMM?

