RE: [Linux-ia64] SAL error record logging/decoding

From: Luck, Tony <tony.luck_at_intel.com>
Date: 2003-05-30 06:49:53
Digging back in this thread to last Thursday ...

> > 2) I crashed my machine with an injected machine check, and
> > then rebooted.  All four of the /proc/sal/cpuX/mca files had
> > a copy of the same error record.  Echoing "clear" to one of
> > them made them all go away.
> 
> Hmm...  this sounds like a reflection of the underlying firmware
> behavior.  I tried this on a 2-way HP box, and the cpu0/mca
> file was different than cpu1/mca, and clearing one did not
> clear the other.
> 
> > I think this is normal ... but it may require some interesting
> > documentation to say why things work like this.
> 
> Why do you think that's normal?  It sounds pretty strange
> to me.

I asked a SAL expert here who said:

 "The SAL spec does not require that the SAL_GET_STATE_INFO API
  be called on the processor where the error was detected (for
  recoverable and fatal errors).  So in this case, the SAL has
  logged it to flash before handing off to the OS.  When the OS
  calls SAL_GET_STATE_INFO, it just retrieves the last error in
  the queue from the flash image.  The processor section of the
  error record has a field for the processsor LID --- so you can
  check if the right processor observed the error."


What error did you inject in the case that you describe above
where you saw different independent records in cpu0/mca and
cpu1/mca?

-Tony
Received on Thu May 29 13:50:16 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:15 EST