Re: show_mem() for ia64 discontig takes a really long time on large systems.

From: Bob Picco <bob.picco_at_hp.com>
Date: 2006-03-29 06:23:17
Robin Holt wrote:	[Tue Mar 28 2006, 01:43:16PM EST]
> Recently, we ran a large system out of memory and the oom_kill() appeared
> to have frozen up.  When we looked at the backtraces, we noticed the cpu
> was making progress, but apparently not fast progress.  As a simple test,
> I did a 'echo m >/proc/sysrq-trigger' and that had not completed in more
> than a half-hour.
> 
> The system was a fully populated 512 node SGI machine.  The way that
> memory is physically layed out results in a single pgdat which covers
> the node with two holes in it.  This is new hardware with larger gaps
> between the chunks of memory that earlier version had.  As show_mem()
> is traversing the entire systems memory to print out stats on remaining
> memory, it takes faults while trying to look at holes in the array of
> struct pages.
> 
> At this point, I am looking for any sort of direction on what would be
> a reasonable fix.  Should show_mem() be made to skip to a page aligned
> point in the array when the fault fails?  Should we add the information
> about start and end of hole to the pgdat()?  Should we have one pgdat
> per chunk?  Are there other better ideas out there?  Any direction would
> be greatly appreciated.
This could work but you need to be cautious because struct page for ia64
isn't a power of 2. Also this would have to be done conditionally because 
SPARSEMEM doesn't require it but of course VIRTUAL_MEM_MAP does.
> 
> Thanks,
> Robin Holt
your welcome,

bob
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Mar 29 06:23:55 2006

This archive was generated by hypermail 2.1.8 : 2006-03-29 06:24:05 EST