ia64 get_mmu_context patch

From: Peter Keilty <Peter.Keilty_at_hp.com>
Date: 2005-10-28 03:28:14
 
 
Gentleman,

Please find attached IA64 context_id patch and supporting data for your
Review and consideration.

Regards,
pete

attached mail follows:


 

Hi Tony,

I have attached a patch for an issue with get_mmu_context() seen seen with
RHEL 2.6.9-15 base code.
I would like you to take a look review and comment on it's merits.
Bob Picco has review the patch prior to me senting it to you.
 
Data on problem:

Here is the results of modifing the code using a bitmap for looking up new
context_id.

Updated png showing AIM7 shared run with before, after and clm data.   

Lockstat Data:
There are 4 sets of lockstat data, one each for loads of 40K, 30K, 20K and
40K with no fork test. The lockstat data shows that as loading increases the
lock contention on the task lock with wrap_mmu_context and higher
utilization of the ia64_ctx lock and the ia64_global_tlb_purge lock. 

Get_mmu_context() is called to get a new process context id number to
uniquely identify it's address space and tbl entries. If the limit is
reached the wrap_mmu_context is called to reset and flush the tlb's.
 
Wrap_mmu_context()  uses ia64_ctx.max_ctx which is set based on ia64 arch.
rid size minus 3 bits used for region number. For Itanium the size is 24
bits, so there are 21 bits used as an increment number. This number and the
region register number (3bits) is used as the process address space number
and is used in the tbl to identify tbl's associated with this process. So
about 2^21 new processes can run before context_id wrapping occurs and the
systems tbl's are flushed.

The difference in the number of times the ia64_ctx  was acquired between
running the fork test and not at 40K load is: 
Lock                         Fork              No Fork
ia64_ctx	                   9,111,760        1,162,555
ia64_global_tlb_purge        101,371,392       33,888,711

Read Task Lock per second for wrap_mmu_context()
Load       Locks/Sec  Contention
20,000         5.8           20%
30,000        10.2           31%
40,000        14.6           39%
40,000          .0001         0%  No Fork or Exec test

Notice the utilization percent for the ia64_ctx and tasklist_lock locks for
the 4 runs.
20K    10%
30K    27%
40K    54%
40K     1%  nofork  
This utilization is based on the number of cycles the lock was busy for the
measurement period.  

Modified the number of bits used in the region id reg for context increment
from 21 to 20 bits. This would cause wrap_mmu_context to be done sooner at
2^21.

Aim7 shared with fork:
Load  21bits   20bits
20K    66K      58K  jobs/min
30K    58K      38K  jobs/min
40K    44K      25K  jobs/min
So now a load of 20K acts like 30K at 21bits and a load of 30K acts like 40K
at 21bits.

The cost of having to search the entire task_list to find a free context
number increases with number of processes in the tests. The following are
the measure times for the wrap_mmu_context function.

Before:
Aim7 shared with fork/exec tests time spent in wrap_mmu_context() walking
task loop.
Load  Jobs/min    Calls  Maxprocs  Cumulative Time  Time/call  Total Run
Time
30K      53279    33828     38198        864.0 sec    25.5 ms     62 min   
40K      42378    79840     52496       2667.4 sec    33.4 ms    108 min
50K      31955   141000     75472       6011.5 sec    42.6 ms    180 min

Modified patch:
Aim7 shared with fork/exec tests time spent in wrap_mmu_context() using 128K
bitmap.
Load  Jobs/min     Calls  Maxprocs  Cumulative Time  Time/call  Total Run
Time      
30K	   61350   1764278     na	     .312 sec      177 ns      48
min    
40K	   60858   3203092     na	     .561 sec	 175 ns      64 min
50K	   60826   5110677     na	     .887 sec      174 ns      80 in

Making the bitmap 512K reduced the tlb flush substantially but the jobs/min
only increased by about 100-150. The 512k size is 1/4 of the max it could
be, 2^21. The patch uses the total 2^21 size.

Lockstat data shows 0% contention of the rwlock task_list in
wrap_mmu_context() and the utilization goes from 54% to 1.6%.
The ia64_ctx lock utlization goes from 54% to 0.1% and the contention down
to 0.3%.

Regards,
pete




-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

shared_ctx_bitmap_2meg.png shared_ctx_bitmap_all.png
Received on Fri Oct 28 03:29:37 2005

This archive was generated by hypermail 2.1.8 : 2005-10-28 03:29:43 EST