Christoph Lameter wrote: > On Thu, 27 Apr 2006, Zoltan Menyhart wrote: > > >>I wanted to use the mm semaphore => no need to walk again the >>pgd ... pte chain. > > > The pgd ... pte chain does not change even without mmap until > the usage of the memory area ceases. It is about about un-mapping a zone while another thread faults on an address belonging to the same zone. We have got a rx = ... -> pgd[i] -> pud[j] -> pmd[k] -> pte[l] chain to walk in the VHPT miss handler. Having reached somewhere in this chain walking, we have got the ph. address of the next page in the chain in a register. Before we can fetch the next item in the chain, "unpredictable long" time can pass. In the mean time: - "free_pgtables()" kills the page we are about to touch. - Someone re-uses the same page for something else. As we are still keeping the same ph. address, we fetch an item from a page that is no more ours. Even if this security window is small, it does exist. The probability to hit this bug grows higher on a NUMA machine with lots of CPUs. I can accept that the VHPT miss handler cannot protected by some locks, it is the other end that should use some "careful un-mapping" in order to avoid race conditions. Here is what I'm working on: PTE, PMD and PUD page usage perfectly fits into the RCU approach: 1. The VHPT miss handler is protected by "rcu_read_lock_bh()". There is not a single instruction added, the required semantics is provided by the fact that the interrupts are off. 2. "free_pgtables()" keeps working as today for the non multi- threaded applications. 3. "free_pgtables()" and its subroutines do not actually free the PTE, PMD and PUD pages for multi-threaded applications. These pages will set free via an "call_rcu_bh()"-activated service. (Perhaps, the weaker protection "rcu_read_lock()" - "call_rcu()" will be enough...) Please note that: - The life span of the PTE, PMD and PUD pages is rather long: they are freed when the usage of the memory area ceases, provided no other map (using the same PTE, PMD and PUD pages) is valid. - The number of the PTE, PMD and PUD pages is much more smaller that that of the leaf pages. Therefore freeing them is not really performance critical. As the "call_rcu_bh()"-activated freeing service will do a batch processing, these is a chance that freeing the PTE, PMD and PUD pages in this way be more efficient then the "pte_free()"... etc. services of today are. Regards, Zoltan - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Fri Apr 28 17:54:21 2006
This archive was generated by hypermail 2.1.8 : 2006-04-28 17:54:37 EST