Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment

From: Ray Bryant <raybry_at_sgi.com>
Date: 2004-04-06 11:05:56
Hi Ken,

Chen, Kenneth W wrote:
>>>>>Ray Bryant wrote on Monday, April 05, 2004 11:22 AM
>>>
>>>Chen, Kenneth W wrote:
>>>I actually started coding yesterday.  It doesn't look too bad (I think).
>>>I will post it once I finished it up later today or tomorrow.
>>
>>Hmmm...so did I.  Oh well.  We can pull the good ideas from both. :-)
> 
> 
> I did have a revelation from your original demand-paging patch with per-inode
> tracking ;-)  I extended it into tracking by struct address_space (so we don't
> pollute inode structure) and added per-block tracking.  See patch at the end of
> this post. I admit I had very pessimistic thoughts until I saw your patch.
> 

Cool!

Either way works, I think.  I just used the u.generic_ip pointer because it 
was there and convenient.  It's intended to be a hook in the VFS layer for a 
particular file system to add info to the inode, as near as I can tell, but 
neither hugetlbfs nor shmfs use it, so it was free for the taking.

> 
> 
>>>There are still some oddity in lifetime of the huge page reservation,
>>>but that can be discussed once everyone sees the code.
> 
> 
> I was thinking the lifetime of the huge page reservation should be the life
> of a mapping, i.e., only persist across mmap/munmap.  That means add a ref
> count in the per-block tracking.  This seriously complicates the design
> because now, ref count needs to be updated in munmap and fault_hander in
> addition to the mmap and truncate.  Not to mention that Andy Whitcroft already
> pointed out we don't get notification from munmap.  Plus it seriously make
> tracking logic complicated and have performance down side as well.
> 
> I guess everyone is OK with reservation lives until file truncate?

One can certainly argue that the only thing that is required to live until 
file truncate is the contents of the huge pages in the page cache, since 
applications expect that the data will be there in the file/segment across 
program executions until the file is truncated or the segment deleted.

But it certainly makes sense to me that if program A creates an mmap()'d file 
of 10 huge pages, that if Program B comes along later and re-mmaps() that same 
file, that Program B will be guaranteed to be able to touch all 10 pages, even 
if Program A only touched 5.  So that is an argument for having the 
reservation last until file truncate/segment removal time.

Additionally, recall that we are trying to emulate the behavior of the 
hugetlb_prefault() implementation. Under that implementation, if Program A 
would mmap() 10 huge pages, then Program B would be guarenteed not to get a 
SIGBUS when it mmap()'s and references those 10 pages, provided only that the 
underlying file/segment was not deleted in between execution of the two programs.

So, I think we >>have<< to have the reservation last until file 
truncate/segment deletion time.  Fortunately, that turns out to be easier to 
implement as well.  :-)

I'll check through your patch and make sure we've both covered the same bases 
there.  If so, we should be good to go with either version.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Mon Apr 5 22:04:36 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:25 EST