Handling nested MCA/INIT

From: Keith Owens <kaos_at_sgi.com>
Date: 2005-10-18 01:03:44
How should we handle nested MCA/INIT events?  There is only one PAL
minstate area per cpu so any nested MCA/INIT will overwrite the current
data, making it impossible to recover.  The best we can do with a
nested event is get some information on why the handlers died then
reboot.

The current MCA/INIT handlers run with psr.mc = 1, so nested events
cannot be delivered.  This makes it impossible to use the nmi button to
find out why the MCA/INIT handler is hung.  I am thinking of changing
mca_asm.S to set psr.mc to 0 to allow nested events.  The handlers
would detect a nested event, gather minimal diagnostics then reboot.
Then we may be able to diagnose hung MCA/INIT handlers, right now we
get no data for this case, which is extremely frustrating.

The only downside that I can see is if the handler is accessing memory
with a hard double bit error, we could get nested MCA events.  Since
the only thing we can do if the MCA handler gets an MCA is to reboot,
the nested event is not really a problem and allowing nested MCA may
still give us better diagnostics.

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Oct 18 01:04:24 2005

This archive was generated by hypermail 2.1.8 : 2005-10-18 01:04:31 EST