Re: Migrate pages from a ccNUMA node to another - patch

From: Dave Hansen <>
Date: 2004-04-06 01:40:33
On Mon, 2004-04-05 at 08:07, Zoltan Menyhart wrote:
> Hirokazu Takahashi wrote:
> > I guess aruguments src_node, mm and pte would be redundant since
> > they can be looked up from old_p with the reverse mapping scheme.
> In my version 0.2, I can do with only the following arguments:
>  *		node:	Destination NUMA node
>  *		mm:	-> victim "mm_struct"
>  *		pte:	-> PTE of the page to be moved
> (If I have "mm" at hand, why not to use it ? Why not to avoid fetching the r-map
> page struct ?)

That's a good point.  There is at least some cost (at least 1 lock)
associated with walking the rmap chains.  If it can be avoided, it might
as well be.  

But, if someone needs the "no walk" interface, just wrap the function:

	rmap_results = get_rmap_stuff(page);
	__foo(page, rmap_results);

__foo(page, rmap_results)

> > >Notes: "pte" can be NULL if I do not know it apriori
> > >       I cannot release "mm->page_table_lock" otherwise I have to re-scan the "mm->pgd".
> > 
> > Re-schan plicy would be much better since migrating pages is heavy work.
> > I don't think that holding mm->page_table_lock for long time would be
> > good idea.
> Re-scanning is "cache killer", at least on IA64 with huge user memory size.
> I have more than 512 Mbytes user memory and its PTEs do not fit into the L2 cache.
> In my current design, I have the outer loops: PGD, PMD and PTE walking; and once
> I find a valid PTE, I check it against the list of max. 2048 physical addresses as
> the inner loop.
> I reversed them: walking through the list of max. 2048 physical addresses as outer
> loop and the PGD - PMD - PTE scans as inner loops resulted in 4 to 5 times slower
> migration.

Could you explain where you're getting these "magic numbers?"  I don't
quite understand the significance of 2048 physical addresses or 512 MB
of memory.

Zoltan, it appears that we have a bit of an inherent conflict with how
much CPU each of you is expecting to use in the removal and migration
cases.  You're coming from a HPC environment where each CPU cycle is
valuable, while the people trying to remove memory are probably going to
be taking CPUs offline soon anyway, and care a bit less about how
efficient they're being with CPU and cache resources.  

Could you be a bit more explicit about how expensive (cpu-wise) these
migrate operations can be?

-- Dave

To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Mon Apr 5 11:41:39 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:25 EST