Excuse me if this ends up being a duplicate. I mailed this out last night but for some reason, it hasn't come through. It is not in the archives nor have I seen it come back through to my mail box. Here is a new utility for looking into salinfo records. It several things differently than salinfo_decode. We have found that this helps considerably in understanding problem on our itanium servers. The attached patch applies to salinfo-0.7 and does not modify salinfo_decode's functioning in any way. In fact the only files that are modifed are the Makefile and the spec file. Here is the man page which tries to illustrate some of the features which were designed into salinfo_decode2. SALINFO_DECODE2(8) Decode Itanium SAL Error Records SALINFO_DECODE2(8) NAME salinfo_decode2 - decode Itanium SAL error records SYNOPSIS salinfo_decode2 [OPTION]... [FILE | DIRECTORY]... DESCRIPTION salinfo_decode2 decodes CMC/CPE/MCA/INIT records obtained from the SAL. It will take a list of files or directories and print out the requested information about the salinfo records that are contained within those files. This is notably different than the salinfo_decode program which processes only a single record at a time. Experience has shown that it can be difficult to identify a hardware failure of the type found in the salinfo logs because the failure results in many salinfo records being created. salinfo_decode2 allows a system administrator to glance at a directory full of errors or some subset of files and obtain an overall impression of how meaningful the errors are. This is done by turning down the verbosity and generalizing what is there. More experienced administrators can turn up the verbosity and get progressively more detailed information. salinfo_decode2 also has the capability to generate output that is designed to be easily parsed by a machine. This is useful when you want to automate monitoring of large numbers of machines. For example, instead of having scripts notify you every time an ignorable single bit memory error occurs, the monitoring scripts can easily ignore those errors and only point out higher priority error conditions. If no files or directories are specified on the command line, stdin is read and is assumed to be a SAL record. salinfo_decode2 also has the advantage that a SAL record from an ia64 can be inspected and analyzed on a non-ia64, non-little endian machine. For example, a system administrator using an ia32 workstation can inspect SAL records from an ia64 cluster. The design of the original salinfo_decode˙s internal architecture precludes this kind of cross- platform utilization. OPTIONS -h, --help Print usage and exit -V, --version Print version information and exit -c, --cmc Only print cmc records -p, --cpe Only print cpe records -m, --mca Only print mca records -i, --init Only print init records -d, --dimm-offset Count dimms starting at 1 not 0. This is useful when the SAL reports failures starting with 0 but the numbers silk screened on a the motherboard begin with 1. This helps reduce system administrator confusion when replacing the memory DIMM. -o, --cpu-offset Count cpus starting at 1 not 0. This is useful when the SAL reports failures starting with 0 but the numbers silk screened on the motherboard begin with 1. This helps reduce system administrator confusion when replacing CPUs. --tiger4 The same as -d & -o. The Intel Tiger 4 motherboard˙s silkscreen counts both CPUs and DIMMs beginning with 1 rather than 0 which is what the SAL returns. -f, --forgiving Be forgiving of errors when opening files and reading data -r, --recursive When a database is a directory traverse its sub-directories -v, --verbosity Specify the verbosity to print records. Verbosity can be 1-6. However, as the verbosity increases, the likelihood that the printing of the detailed information hasn˙t been implemented yet also increases. Patches to remedy this situation are eagerly accepted. The goal with the progressive levels of verbosity is to facilitate understanding of records, not just to blurt out every scrap of available information. Since verbosity 6 is largely not implemented yet, if you need all of available information, use the original salinfo_decode. -s, --scriptable Output in a machine readable format. This is designed to facilitate quick and easy shell scripting with the output. Refer to the examples section for intended use. EXAMPLES Pointing salinfo_decode2 at a directory of a few errors with the verbosity set very low shows that all the errors are mainly inconsequential: $ ./salinfo_decode2 -v1 tigertest/ cpe with severity "corrected" occurred at 12:03:08 on Apr 1 2004 cpe with severity "corrected" occurred at 12:03:10 on Apr 1 2004 cpe with severity "corrected" occurred at 12:32:14 on Apr 1 2004 cpe with severity "corrected" occurred at 17:24:44 on Apr 1 2004 Here is an example of how different levels of verbosity present the same SAL record differently: $ ./salinfo_decode2 -v1 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0 cpe with severity "corrected" occurred at 12:03:08 on Apr 1 2004 $ ./salinfo_decode2 -v2 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0 record 612413502631444488 contains the following sections: (PCI component) (PCI component) (PCI component) (PCI component) (memory) (platform specific) $ ./salinfo_decode2 -v3 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0 record 612413502631444488 contains the following sections: PCI component with (vend/dev) 8086/500 at (Seg/Bus/Dev/Func) 0/255/24/0 reported a fault PCI component with (vend/dev) 8086/501 at (Seg/Bus/Dev/Func) 0/255/24/1 reported a fault PCI component with (vend/dev) 8086/502 at (Seg/Bus/Dev/Func) 0/255/24/2 reported a fault PCI component with (vend/dev) 8086/503 at (Seg/Bus/Dev/Func) 0/255/24/3 reported a fault Memory fault at (node/card/module/bank/device) 0/0/8/0/0 OEM component with id 0x44fc4766d807e40f reported a fault Here is an example of how to use the scriptable interface to change the formatting of the output and to select one record out of many which match a specific criteria. $ ./salinfo_decode2 -v1 -s sample_data/ | while read line;do > eval $line > if [ "$severity" != "corrected" ];then > echo $month/$day/$year > fi > done 4/1/2004 BUGS Many levels of verbosity for many types of errors are not yet implemented. The project reached a state where it did what the users needed it to do and then I was asked to work on other things. Patches are greatfully accepted. AUTHOR Ben Woodard <woodard@redhat.com> SEE ALSO salinfo_decode(8) Linux Jan 6, 2005 SALINFO_DECODE2(8) - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:34 EST