Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Jeff Garzik <jgarzik_at_pobox.com>
Date: 2005-03-02 03:37:24
Hidetoshi Seto wrote:
> Hi, long time no see :-)
> 
> Currently, I/O error is not a leading cause of system failure.
> However, since Linux nowadays is making great progress on its
> scalability, and ever larger number of PCI devices are being
> connected to a single high-performance server, the risk of the
> I/O error is increasing day by day.
> 
> For example, PCI parity error is one of the most common errors
> in the hardware world. However, the major cause of parity error
> is not hardware's error but software's - low voltage, humidity,
> natural radiation... etc. Even though, some platforms are nervous
> to parity error enough to shutdown the system immediately on such
> error. So if device drivers can retry its transaction once results
> as an error, we can reduce the risk of I/O errors.
> 
> So I'd like to suggest new interfaces that enable drivers to
> check - detect error and retry their I/O transaction easily.

I have been thinking about PCI system and parity errors, and how to 
handle them.  I do not think this is the correct approach.

A simple retry is... too simple.  If you are having a massive problem on 
your PCI bus, more action should be taken than a retry.

In my opinion each driver needs to be aware of PCI sys/parity errs, and 
handle them.  For network drivers, this is rather simple -- check the 
hardware, then restart the DMA engine.  Possibly turning off 
TSO/checksum to guarantee that bad packets are not accepted.  For SATA 
and SCSI drivers, this is more complex, as one must retry a number of 
queued disk commands, after resetting the hardware.

A new API handles none of this.

	Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Mar 1 11:38:20 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:36 EST