RE: [PATCH] HUGETLB memory commitment

From: Chen, Kenneth W <kenneth.w.chen_at_intel.com>
Date: 2004-04-04 13:31:30
>>>>> Ray Bryant wrote on Fri, April 02, 2004 7:57 PM
> Chen, Kenneth W wrote:
> >
> > Can we just RIP this whole hugetlb page overcommit?
> >
>
> Ken et al,
>
> Perhaps the following patch might be more to your liking.  I'm
> sorry I haven't been contributing to this discussion -- I've been
> off doing this code first for Altix under 2.4.21 (one's got to eat,
> after all).  Now I've ported the changes forward to Linux 2.6.5-rc3
> and tested them.  The patch below is relative to that version of Linux.

Somehow the patch came through with extra white space at beginning of
each line, but s/^  / / fix that up.


> The hugetlb memory commit code does this with a single global counter:
> htlbzone_reserved, and a per inode reserved page count.  The latter is
> used to decrement the global reserved page count when the inode is
> deleted or the file is truncated.

A simple counter won't work for different file offset mapping.  It has to
be some sort of per-inode, per-block reservation tracking.  I think we are
steering in the right direction though.


> diff -Nru a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> --- a/fs/hugetlbfs/inode.c	Fri Apr  2 19:31:56 2004
> +++ b/fs/hugetlbfs/inode.c	Fri Apr  2 19:31:56 2004
> @@ -59,19 +58,34 @@
>   	if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
>   		return -EINVAL;
>
> -	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
> + 	reserved_pages = (vma->vm_end - vma->vm_start) >> HPAGE_SHIFT;
>
>   	down(&inode->i_sem);
>   	file_accessed(file);
> +
> + 	/* a second mmap() (or a rmap()) can change the reservation */
> + 	prev_reserved_pages = inode->u.data;
> +
> + 	/*
> + 	 * if current mmap() is smaller than previous reservation,
> + 	 * we don't change reservation or quota
> + 	 */
> + 	if (reserved_pages >= prev_reserved_pages) {
> + 		new_reservation = reserved_pages - prev_reserved_pages;
> + 		if ((hugetlb_get_quota(mapping, new_reservation) < 0) ||
> + 			(hugetlb_reserve(new_reservation) < 0)) {
> + 				up(&inode->i_sem);
> + 				return -ENOMEM;
> + 		}
> + 		inode->i_size = reserved_pages << HPAGE_SHIFT;
> + 		inode->u.data = reserved_pages;
> + 	}
> +	up(&inode->i_sem);
> +

This assumes all mmap start from the same file offset. IMO, it's not
generic enough. This code will only reserve 1 page for the following
case, but actually there are 4 mapping totaling 4 pages:

mmap 1 page at file offset 0
mmap 1 page at file offset HPAGE_SIZE,
mmap 1 page at file offset HPAGE_SIZE*2,
mmap 1 page at file offset HPAGE_SIZE*3,

Oh, this code broke file system quota accounting as well.

- Ken


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sat Apr 3 22:33:42 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:25 EST