Re: [NUMA] Display and modify the memory policy of a process through /proc/<pid>/numa_policy

From: Christoph Lameter <clameter_at_engr.sgi.com>
Date: 2005-07-16 09:11:00
On Sat, 16 Jul 2005, Andi Kleen wrote:

> > Updating the memory policy is also useful if memory on one node gets 
> > short and you want to redirct allocations to a node that has memory free. 
> 
> If you use MEMBIND just specify all the nodes upfront and it'll
> do the normal fallback in them. 
> 
> If you use PREFERED it'll do that automatically anyways.

No it wont. If you know that you are going to start a process that must 
run on node 3 and know its going to use 2G but there is only 1G free 
then you may want to modify the policy of an existing huge process on 
node 3that is still allocating to go to node 2 that just happens to have 
free space.

> > A batch scheduler may anticipate memory shortages and redirect memory 
> > allocations in order to avoid page migration.
> I think that jobs more belongs to the kernel. After all we don't
> want to move half of our VM into your proprietary scheduler.

Care to tell me which proprietary scheduler you are talking about? I was 
not aware the existance of such a thing. I am particularly surprised that 
this proprietary scheduler exists before we have a working interface.

And you are now going to implement automatic page migration into the 
existing scheduler?

> > I'd rather have that logic in userspace rather than fix up page_migrate 
> > again and again and again. Automatic recalculation of memory policies is 
> > likely an unexpected side effect of the existing page migration code. 
> 
> Only if you migrate again and again.

If you encounter different situation then you may need different address 
translation. F.e. lets say you want to move a process from node 3 and 4 to 
node 5. That wont work with the existing patches. Or you want a process 
running on node 1 to be split to nodes 2 and 3. You want 1G to be moved to 
node 2 and the rest to node 3. Cannot be done with the old page migration.

> > Policies should only change with explicit instructions from user space and 
> > not as a side effect of page migration.
> 
> Well, page migration would be a "explicit instruction from user space" 

Existing page migration does not specify a memory policy it just 
translates it. And its inflexible and unable to handle some common 
situations described above.

> > And curiously with the old page migration code: The only way to change the 
> > a memory policy is by page migration and this is automatically behind your 
> > back.
> 
> mbind can change policy at any time. Just only for the local
> process, as that is the the only one who has enough information
> to really do this.

Which makes mbind useless for the sysadmin and/or batch scheduler in the 
scenarios we are discussing. That is the key reason why we need this patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Fri Jul 15 19:11:22 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:40 EST