Re: page fault scalability patch V11 [0/7]: overview

From: Martin J. Bligh <>
Date: 2004-11-21 03:59:36
>> Given that we have contention problems updating a single mm-wide rss and
>> given that the way to fix that up is to spread things out a bit, it seems
>> wildly arbitrary to me that the way in which we choose to spread the
>> counter out is to stick a bit of it into each task_struct.
>> I'd expect that just shoving a pointer into mm_struct which points at a
>> dynamically allocated array[NR_CPUS] of longs would suffice.  We probably
>> don't even need to spread them out on cachelines - having four or eight
>> cpus sharing the same cacheline probably isn't going to hurt much.
>> At least, that'd be my first attempt.  If it's still not good enough, try
>> something else.
> That is what Bill thought too. I guess per-cpu and per-thread rss are
> the leading candidates.
> Per thread rss has the benefits of cacheline exclusivity, and not
> causing task bloat in the common case.
> Per CPU array has better worst case /proc properties, but shares
> cachelines (or not, if using percpu_counter as you suggested).

Per thread seems much nicer to me - mainly because it degrades cleanly to 
a single counter for 99% of processes, which are single threaded.


To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Sat Nov 20 12:04:30 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:32 EST