Re: [RFC/PATCH, 1/4] readX_check() performance evaluation

From: David Mosberger <davidm_at_napali.hpl.hp.com>
Date: 2004-01-30 06:28:01
>>>>> On Thu, 29 Jan 2004 09:23:20 +0100 ("CET), Matthias Fouquet-Lapar <mfl@kernel.paris.sgi.com> said:

  Matthias> We have done a rather large study with DIMMs that had SBEs
  Matthias> and have found no evidence that a SBE turns into a UCE,
  Matthias> i.e. the fact that a SBE is reported, is no indication
  Matthias> that the device might fail soon.

  Matthias> As a matter of fact the soft error rates increases while
  Matthias> parts use smaller process technologies and lower supply
  Matthias> voltages. Cosmic rays are one source for soft
  Matthias> errors. Another source are alpha particles emitted by the
  Matthias> solder.

Ehh, wait a second: you're saying that your study proved that if the
device isn't failing, it isn't failing. ;-) Of course you'll get noise
and perhaps even lots of it due to cosmic rays but this doesn't say
anything about the error pattern you when a device _is_ failing (e.g.,
due to overheating, over-clocking, or wrong voltage).  Or did your
study cover the cases where a system is operated under "out-of-spec"
situation?

  Matthias> Still I think it's important to log SBEs, but you probably
  Matthias> will need a treshhold in case you hit a hard SBE. Also
  Matthias> scrubbing the memory location (and re-read the location to
  Matthias> check if the error was transient or not) might be a good
  Matthias> idea if the memory controller supports this.  If it is a
  Matthias> true, hard SBE it should be reported. It also might be a
  Matthias> good idea to mark the page, so it does not get
  Matthias> re-allocated.

Yes.  And once I finally received Andi's earlier mails (guess I have
to thank MyDoom for that... ;-( ), it was clear that nobody argued for
turning off the error reporting.  The issue was only whether or not to
log a message via printk() (which, in this case, clearly isn't a good
idea).  So I think we're all in violent agreement.

	--david
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Jan 29 14:36:54 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:22 EST