Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Linas Vepstas <>
Date: 2005-03-03 04:44:38
On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
> Note that here is a difficulty: the MCA handler on some arch would run on
> special context - MCA environment. In other words, since some MCA handler
> would be called by non-maskable interrupt(e.g. NMI), so it's difficult to
> call some driver's callback using protected kernel locks from MCA context.
> Therefore what MCA handler could do is just indicates a error was there,
> by something like status flag which drivers can refer. And after possible
> deley, we would be able to call callbacks.

FWIW, many device drivers do I/O in an interrupt context (e.g. from
timer interrupts), which is why error recovery needs to run in a
separate thread from error detection.

On ppc64, here's what I currently do:
-- check for pci error (call firmware & ask it)
-- if (error) put event on a workqueue
-- workqueue handler sends out a notifier_call_chain 
   to anyone who cares.

Thus, the error recovery thread runs in its own thread.
Right now, the above code is arch-specific, but I don't see any
reason we couldn't make the event, the workqueue and the
notifier_call chain arch-generic. 

Below is some "pseudocode" version (mentally substitute 
"pci error event" for every occurance of "eeh").  Its got some
ppc64-specific crud in there that we have to fix to make it 
truly generic (I just cut and pasted from current code).

Would a cleaned up version of this code be suitable for a 
arch-generic pci error recovery framework?  Seto, would 
this be useful to you?


Header file:

 * EEH Notifier event flags.
 * Freeze -- pci slot is frozen, no i/o is possible

/** EEH event -- structure holding pci slot data that describes
 *  a change in the isolation status of a PCI slot.  A pointer
 *  to this struct is passed as the data pointer in a notify callback.
struct eeh_event {
   struct list_head     list;
   struct pci_dev       *dev;
   int                  reset_state;
   int                  time_unavail;

/** Register to find out about EEH events. */
int eeh_register_notifier(struct notifier_block *nb);
int eeh_unregister_notifier(struct notifier_block *nb);

C file:

/* EEH event workqueue setup. */
static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
static void eeh_event_handler(void *);
DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);

static struct notifier_block *eeh_notifier_chain;

 * eeh_register_notifier - Register to find out about EEH events.
 * @nb: notifier block to callback on events
int eeh_register_notifier(struct notifier_block *nb)
   return notifier_chain_register(&eeh_notifier_chain, nb);

 * eeh_unregister_notifier - Unregister to an EEH event notifier.
 * @nb: notifier block to callback on events
int eeh_unregister_notifier(struct notifier_block *nb)
   return notifier_chain_unregister(&eeh_notifier_chain, nb);

 * queue up a pci error event to be dispatched to all listeners
 * of the pci error notifier call chain.  This routine is safe to call
 * within an interrupt context.  The actual event delivery
 * will be from a workque thread.

void eeh_queue_failure(struct pci_dev *dev) 
	struct eeh_event  *event;

   event = kmalloc(sizeof(*event), GFP_ATOMIC);
   if (event == NULL) {
      printk (KERN_ERR "EEH: out of memory, event not handled\n");
      return 1;

   event->dev = dev;
   event->reset_state = rets[0];
   event->time_unavail = rets[2];

   /* We may be called in an interrupt context */
   spin_lock_irqsave(&eeh_eventlist_lock, flags);
   list_add(&event->list, &eeh_eventlist);
   spin_unlock_irqrestore(&eeh_eventlist_lock, flags);

   /* Most EEH events are due to device driver bugs.  Having
    * a stack trace will help the device-driver authors figure
    * out what happened.  So print that out. */
   if (rets[0] != 5) dump_stack();

 * eeh_event_handler - dispatch EEH events.  The detection of a frozen
 * slot can occur inside an interrupt, where it can be hard to do
 * anything about it.  The goal of this routine is to pull these
 * detection events out of the context of the interrupt handler, and
 * re-dispatch them for processing at a later time in a normal context.
 * @dummy - unused
static void eeh_event_handler(void *dummy)
   unsigned long flags;
   struct eeh_event  *event;

   while (1) {
      spin_lock_irqsave(&eeh_eventlist_lock, flags);
      event = NULL;
      if (!list_empty(&eeh_eventlist)) {
         event = list_entry(, struct eeh_event, list);
      spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
      if (event == NULL)

      if (event->reset_state != 5) {
         printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
                "%s %s\n", event->reset_state,
                pci_name(event->dev), pci_pretty_name(event->dev));

      notifier_call_chain (&eeh_notifier_chain,
                 EEH_NOTIFY_FREEZE, event);


To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Wed Mar 2 12:50:44 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:36 EST