RE: new utility for decoding salinfo records

From: Luck, Tony <tony.luck_at_intel.com>
Date: 2005-01-12 06:49:52
>  Ben>        salinfo_decode2 also has the capability to generate
>  Ben> output that is designed to be easily parsed by a machine. This
>  Ben> is useful when you want to automate monitoring of large numbers
>  Ben> of machines. For example, instead of having scripts notify you
>  Ben> every time an ignorable single bit memory error occurs, the
>  Ben> monitoring scripts can easily ignore those errors and only
>  Ben> point out higher priority error conditions.
>
>It seems a bit dangerous to me to encourage ignoring single-bit
>errors.  Perhaps it would be better to suggest to summarize these
>errors?

Ben's world view might be a little skewed by his test case :-)

http://www.californiadigital.com/thunder.shtml
[web page is out of date in regard to position on the top500 list, it
was pushed down to #5 in the latest list].

For this system you really wouldn't want to wake your system
admins for every single bit error that was reported (though
summarizing the errors in a weekly/monthly report would of course
be a good thing).  I believe that salinfo_decode2 makes doing
this easy too.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Jan 11 14:52:07 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:34 EST