Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Benjamin Herrenschmidt <benh_at_kernel.crashing.org>
Date: 2005-03-02 09:27:27
On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:

> The current proposal (and prototype) has a "master recovery thread"
> to handle the coordinated reset of the pci controller.  This master
> recovery thyread makes three calls in struct pci_driver:
> 
>    void (*frozen) (struct pci_dev *);  /* called when dev is first frozen */
>    void (*thawed) (struct pci_dev *);  /* called after card is reset */
>    void (*perm_failure) (struct pci_dev *);  /* called if card is dead */

See my other emails. I think only one callback is enough, and I think we
need more parameters.

> The master recovery thread runs in the kernel.  Earlier suggestions said
> "run it in user space, use pci hotplug, use udev, etc." However, if
> you get a pci error on a scsi card, you can't shell script 
> "umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
> beacuse you can't umount an open filesystem, and you can't really close
> it (I fiddled with prototyping some of this, but its ugly and painful
> and bizarre and outside my area of expertise :)
> 
> FWIW, the current prototype tries to do a pci hotplug if the above
> routines aren't implemented in struct pci_driver.  It can recover 
> from pci errors on ethernet cards, and I have one scsi driver that
> successfully recovers with above API, and am working on adding recovery
> to the symbios driver.
> 
> --linas
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Mar 1 17:33:59 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:36 EST