Re: [Linux-ia64] rx2600 HW-error only when running 2.4.20

From: Alex Williamson <alex_williamson_at_hp.com>
Date: 2003-03-18 08:18:44
   If you just want to get rid of the error, turn off CONFIG_IA64_MCA
or comment out the call to ia64_mca_cpe_poll in:

arch/ia64/kernel/mca.c:ia64_mca_late_init()

The lspci output listed below is just the fake pci device for the
zx1 local bus adapter.  Bus 0x80 is slot 1 on the rx2600 (top slot).
Is there a card installed there?  May be worth running diagnostics
on the system if you're getting errors like this from an empty slot.
If there's a device in that slot the system doesn't know how to handle,
there may be some useful messages in the log firmware prints to the
serial console during bootup.  Let me know if you want to debug this
further.  Thanks,

	Alex

Steinar Traedal-Henden wrote:
> 
> Hi Alex,
> 
> So, its nothing to worry about, but how can I configure the kernel so that the
> error message dissapear? It really fills up the syslog..
> 
> here is the output of lspci and errdump: (hope you can help)
> 
> [compute-1-0]# lspci -s 0x80: -vvv
> 80:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32)
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
>         Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 64, cache line size 20
>         Region 0: Memory at 00000000fed28000 (32-bit, non-prefetchable) [size=8K]
>         Capabilities: [a0] PCI-X non-bridge device.
>                 Command: DPERE+ ERO- RBC=0 OST=0
>                 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
> 
> Shell> errdump cpe
> **** CPE Error Log Dump ****
> 
> Firmware Revision: fwbtr_main_view.01.44-0
> Architected SAL Record ID  0x0000000000000000
> Time this log was recorded: 03/17/2003 at 11:19:30
> 
> **** zx1 IOC Registers ****
>   iocErrorValid                 0x0000000000000000
> 
> **** PCI Component Registers ****
>   pciCompErrorValid             0x0000000000000000
> 
> **** PCI Bus Registers ****
>   pciBusErrorValid              0x0000000000000001
> 
>   ---- PCI Bus ----
>   validation_bits               0x000000000000048f
>   error_status                  0x00000000004a1700
>   error_type                    0x            0000
>   bus_id                        0x            0080
>   bus_addr                      0x0000000000000000
>   bus_data                      0x0000000000000000
>   bus_cmd                       0x0000000000000000
>   bus_requestor_id              0x0000000000000000
>   bus_responder_id              0x00000000fed28000
>   bus_target_id                 0x0000000000000000
>   bus_oem_id[0]                 0x000000000000122e
>   bus_oem_id[1]                 0x0000000000000000
>   cellNum                       0x        00000000
>   sbaNum                        0x            0000
>   ropeNum                       0x            0004
>   .... Mercury LBA ....
>   error_status 0x688            0x0000080100000801
>   master_id_log 0x0690          0x0000000000000010
>   inbound_err_add 0x0290        0x0000000000000000
>   inbound_err_attrib 0x0298     0x0000000000000000
>   completion_msg_log 0x02A0     0x0000000000000000
>   outbound_err_address 0x0070   0x0000000000000000
>   error_config 0x0680           0x0000000000001d50
>   status_info_cntrl 0x0108      0x0000000000000040
>   function_id 0x0000            0x02b00146122e103c
>   capabilities_list 0x0060      0x0f00023700200002
>   agp_command 0x0068            0x0000000000000000
>   pcix_capabilities 0x00A0      0x0013ff0000010007
>   olr_control 0x0600            0x0002371d00032403
>   clock_control 0x0618          0x0000000000000038
>   bus_mode 0x0620               0xa1a974ae2f3504c0
> 
> regards
> Steinar
> 
> On Mon, 17 Mar 2003, Alex Williamson wrote:
> 
> > Steinar Traedal-Henden wrote:
> > >
> > > Hi,
> > >
> > > I get the following HW error on a HP rx2600 when I run my own compiled
> > > 2.4.20 kernel.
> > >
> > > Mar 17 04:13:35 compute-1-0 kernel: +BEGIN HARDWARE ERROR STATE AT CPE
> > > Mar 17 04:13:35 compute-1-0 kernel: +Err Record ID: 2833    SAL Rev:  0.02
> > > Mar 17 04:13:35 compute-1-0 kernel: +Time: 03/17/2003 04:19:49    Severity 2
> > > Mar 17 04:13:35 compute-1-0 kernel: +Platform PCI Bus Error Info Section
> > > Mar 17 04:13:35 compute-1-0 kernel: + PCI Bus Error Detail:  Error Status: 0x4a1700 Error Type: 0x0 Bus ID: 0x80 Bus Address: 0x0 Responder ID: 0xfed28000+END HARDWARE ERROR STATE AT CPE
> >
> >    You're getting a CPE (Corrected Platform Error) record.  Polling
> > for CPEs was added in 2.4.20, so it's not surprising you didn't see
> > them before.  The good news is that the error is corrected, this is
> > just the system telling you about it.  You should probably try to
> > figure out what the problem is though in case it leads to uncorrectable
> > problems that will MCA your box.  Most of the error record is documented
> > in the SAL spec.  Here's what we can determine:
> >
> > Error Status: 0x4a1700
> >
> >  - bit8-15 = Error Type 0x17 = 23 = ERR_PROTOCOL (Detection of a protocol error)
> >  - bit 17 = Control: Error was detected on the control signals or in
> >             the control portion of the transaction
> >  - bit 19 = Responder: Error was detected by the responder of the transaction
> >  - bit 22 = Overflow
> >
> > Error Type: 0x0 = Unknown or OEM System Specific Error
> >
> > What do you have in the slot corresponding to bus 0x80?  An lspci -vvv
> > might be helpful.  If you go back to an EFI Shell and run 'errdump cpe'
> > that might provide us with more information about what's happening.
> > Thanks,
> >
> >       Alex
> >
> > --
> > Alex Williamson                             HP Linux & Open Source Lab
> >
> > _______________________________________________
> > Linux-IA64 mailing list
> > Linux-IA64@linuxia64.org
> > http://lists.linuxia64.org/lists/listinfo/linux-ia64
> >

--
Alex Williamson                             HP Linux & Open Source Lab
Received on Mon Mar 17 13:22:14 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:12 EST