wrong initial ia64_kr(current_stack) value

From: Chen, Kenneth W <kenneth.w.chen_at_intel.com>
Date: 2003-10-10 01:36:51
We start seeing random kernel hang at fairly late stage of booting when lots of processes are spawned by the init script.  The kernel used is a variant of 2.4.21.  At the time of the hang, one CPU is stuck in page fault handler with no apparent valid dtlb mapping for the kernel stack.  Interesting enough, the task that the stuck CPU is executing has its kernel stack allocated out of 16-32MB physical memory range (we are using 16MB kernel granule in this exercise).  We finally tracked it down to be a bug in _start() where IA64_KR(CURRENT_STACK) was incorrectly initialized.

This code in head.S is wrong:
    mov r16=KERNEL_TR_PAGE_NUM

    // load the "current" pointer (r13) and ar.k6 with the current task
    mov r13=r2
    mov IA64_KR(CURRENT)=r3         // Physical address

    // initialize k4 to a safe value (64-128MB is mapped by TR_KERNEL)
    mov IA64_KR(CURRENT_STACK)=r16

r16 is loaded with the kernel page number measured in (1<<KERNEL_TR_PAGE_SHIFT) pages, but the check in ia64_switch_to() expects it to be in (1<<IA64_GRANULE_SHIFT) units.  When granule size is 16MB, we can hit a problem if the task structure area (the 2 pages allocated for task_stuct and kernel stack area) of the first process that we switch out of idle is in the physical address range [16MB,32MB], as the check in ia64_switch_to() will mistakenly think that we already have this mapping loaded in dtr[2] but actually it doesn't.

The hang will end up with nested TLB fault where secondary DTLB miss for the kernel stack will never complete.  The mishap is due to initialization code using the wrong page size when computing the initial value for the "safe" page number that was stored in a kernel register marking which address was mapped for the stack.  ia64_switch_to is really confused on which page is mapped in DTR when coming out of idle, because someone lied to him.

I'm surprised that this bug has gone underground for so long.  It could happen on any SMP system out there, but it is easier for the bug to bite on a system with lots of CPUs.  Here is a patch that fixed problem.  Kudos to Tony Luck, Kimi Suganuma and Nomura-san for helping me track this down.

- Ken

p.s. only 2.4.x kernel has this bug.

To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Thu Oct 9 11:42:19 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:19 EST