Re: [RFC/PATCH, 1/4] readX_check() performance evaluation

From: Andi Kleen <ak_at_suse.de>
Date: 2004-01-29 07:01:32
On Wed, 28 Jan 2004 11:48:05 -0800
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> >>>>> On Wed, 28 Jan 2004 20:39:15 +0100, Andi Kleen <ak@suse.de> said:
> 
>   >> Yet they are a good indicator that something is wrong (not performing
>   >> properly) or may be failing soon.  I don't think putting on blinders
>   >> for such problems is a good idea.  Though I agree that the question of
> 
>   Andi> Most server class hardware should log it somewhere and allow
>   Andi> to read the event log in the firmware. This even works for
>   Andi> unhandleable errors unlike what the OS could do.
> 
> And you'd want to reboot your server just so you can check on the soft
> failure rate? ;-)

Yep, I reboot my machines all the time ;-) 

Seriously you can count it somewhere and present it in sysfs or /proc.
Or log it somewhere else and supply a special utility to show them
that makes it clear that the events are hardware and not software related.
I suppose if your server vendor is serious they will supply a tool
to read the firmware log from a running system.

But printks enabled by default are a bad idea (and a bug too BTW - printk called from 
MCE handlers can randomly deadlock) 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Jan 28 15:09:00 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:21 EST