Re: [Lse-tech] Re: Hugetlbpages in very large memory machines.......

From: Andrew Morton <akpm_at_osdl.org>
Date: 2004-03-14 19:57:37
Ray Bryant <raybry@sgi.com> wrote:
>
> 
> I agree with the compatibility concern, but the other part of the problem
> is that while hugetlb_prefault() is running, it holds both the mm->mmap_sem in
> write mode and the mm->page_table_lock.  So not only does it take 500 s for
> the mmap() to return on our test system, but ps, top, etc all freeze for the
> duration.  Very irritating, especially on a 64 or 128 P system.

Well that's just a dumb implementation.  hugetlb_prefault() doesn't need
page_table_lock while it is zeroing the page: just drop it, test for
-EEXIST returned from add_to_page_cache().

In fact we need to do that anyway: the current code is buggy if some other
process with a different mm gets in there and instantiates the page in the
pagecache before this process does: hugetlb_prefault() will return -EEXIST
instead of simply accepting the race and using the page which someone else
put there.

After we have the page in pagecache we need to retake page_table_lock and
check that the target pte is still pte_none().  If it is not, you know that
some other thread has already instantiated a pte there so the new ref to
the pagecache page can simply be dropped.  See how do_no_page() handles it.
Of course, this only applies if mmap_sem is no longer held in there.

As for holding mmap_sem for too long, well, that can presumably be worked
around by not mmapping the whole lot in one hit?

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sun Mar 14 03:58:05 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:24 EST