Re: [patch] Remove limit on MCA recoveries

From: Russ Anderson <rja_at_sgi.com>
Date: 2005-01-18 06:27:10
Matthias Fouquet-Lapar wrote:
> Keith Owens wrote:
> > Russ Anderson <rja@sgi.com> wrote:
> > >The MCA recovery driver saves the addresses of memory errors
> > >in an array.  The array has 32 entries.  The effect is 
> > >that after 32 recoveries, the driver stops recovering.
> > >
> > >This patch removes the page_isolate array.  Since the array
> > >was only used to see if the page is already marked reserved,
> > >check the reserved bit instead of the array.
> > 
> > lkcd dumps kernel pages marked reserved, so lkcd will try to process
> > isolated pages.  We will eventually need to add a new page flag to mark
> > faulty pages.
> 
> Probably any other dump mechanism should be aware of bad HW pages as well,
> so we might be better off to add a flag right away.  While we are at it I
> would propose to have actually two flags :
> 
>   - hard error (which will cause a MCA and should be skipped when taking
>                 a system dump)
>   - soft error (page has encountered SBE, so we might want to avoid future
>                 allocation, but it can be dumped without causing an MCA)

Yes, there should be page->flags to indicate hard and soft memory errors,
such as PG_hard_error and PG_soft_error.  The dump code could look at those 
flags.

include/linux/page-flags.h has PG_error for indicating I/O errors, which 
is close but not quite what we need, given the way it is used.  

CCing linux-kernel since the flags are not ia64 specific.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Mon Jan 17 14:27:27 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:34 EST