Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment

From: Ray Bryant <raybry_at_sgi.com>
Date: 2004-04-06 01:26:32
Ken,

Chen, Kenneth W wrote:

> 
> A simple counter won't work for different file offset mapping.  It has to
> be some sort of per-inode, per-block reservation tracking.  I think we are
> steering in the right direction though.
> 
> 
> 

OK, pardon my question about test code, that is trivial enough I guess.

Anyway, the only way I can see to make this work with non-zero offset is to 
hang a list of segment descriptors (offset and size) for each reserved segment 
off of the inode.  Then when a new mapping comes in, we search the segment 
list to see if the new offset and size overlaps with any of the existing 
reserved segments.  If it doesn't, then we make a new reservation (and request 
file system quota) for the current size, and add the current request to the 
reserved segment list.  If it does, and it fits entirely in a previously 
reserved segement, then no change to reservation/quota needs to be made.  If 
it only partially fits, then we need to make a new reservation/quota request 
for the number of new huge pages required and update the overlapping segment's 
length to reflect the new reservation.

Then in truncate_hugepages() we can search the segment list again, discarding 
full or partial segments that occur either entirely or partially beyond 
"lstart", as appropropriate and doing hugetlb_unreserve() and 
hugetlbfs_put_quota() for the appropriate number of pages.

This will be quite a bit of code and complexity.  Do we still think this is 
all worth it to follow Andrew's suggestion of no API changes for "allocate on
fault" hugetlbpages?  It would be a lot cleaner just to return SIGBUS if we 
run out of hugepages and be done with it, in spite of the API change.

Is there a simpler way to do the correct reservation?  (One could allocate the 
pages at mmap() time, resurrecting hugetlb_prefault(), but zero the pages at 
fault time, this would solve the original problem we ran into at SGI, but 
would not solve Andi's requirement to postpone allocation so NUMA API's can 
control placement.)

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Mon Apr 5 12:22:55 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:25 EST