Hi Keith and all, concerning this issue, it works well on Bull Novascale 5160. However, have you tested INIT feature with a 2.6.15 kernel ? Indeed, since this kernel version, I have noticed that on Intel Tiger machines, the behavior was exactly the same than the description you are giving here below. After a more detailed investigation with an ITP, I have seen that the trouble ever happens when executing the following code : ________________________________________ ia64_old_stack: add regs=MCA_PT_REGS_OFFSET, r3 mov b0=r2 // save return address GET_IA64_MCA_DATA(temp2) LOAD_PHYSICAL(p0,temp1,1f) ;; mov cr.ipsr=r0 mov cr.ifs=r0 mov cr.iip=temp1 ;; invala rfi <--------------------------------------- ________________________________________ After rfi instruction, the kernel INIT handler is called again instead of executing the code located at "temp1" address. Since we provide our own SAL version on NS5160 machines, I think that the problem might be located at the SAL level, My comprehension is that there might be a misfunctioning in the SAL concerning INIT event management and when psr.mc bit is forced to 0 again, the previous INIT signal is not filtered anymore, and the entire INIT call chain is executed again. But it is just a personal interpretation and I have no proof about this. This point has been submitted to Intel gurus and is under investigation. Best regards, Francois WELLENREITER >2.6.16 on SN2, compiled with gcc 3.3.3, no KDB. > >The SN2 controller 'NMI' command sends INIT to all processors, one as >monarch, the rest as slaves. If all the processors are in kernel space >(including idle) then INIT resumes after dumping the process list. If >any of the processors are in user space then INIT claims to resume but >gets something wrong, the system becomes dead. > >Send first NMI > > Entered OS INIT handler. PSP=ffe301a0 cpu=0 monarch=0 > cpu 0, INIT occurred in user space, original stack not modified > Entered OS INIT handler. PSP=ffe301a0 cpu=3 monarch=0 > Entered OS INIT handler. PSP=ffe301a0 cpu=2 monarch=0 > Entered OS INIT handler. PSP=ffe301a0 cpu=1 monarch=1 > Delaying for 5 seconds... > Processes interrupted by INIT - 0 (cpu 1 task 0xe00000b47a4b8000) 0 (cpu 2 task 0xe00000b47a4e8000) 0 (cpu 3 task 0xe00000b47a500000) > > ... process dump ... > > INIT dump complete. Monarch on cpu 1 returning to normal service. > Slave on cpu 0 returning to normal service. > Slave on cpu 3 returning to normal service. > Slave on cpu 2 returning to normal service. > > ... No response ... > >Send second NMI > > Entered OS INIT handler. PSP=ffe301a0 cpu=3 monarch=0 > Entered OS INIT handler. PSP=ffe301a0 cpu=0 monarch=0 > cpu 0, INIT inconsistent previous current and r13, original stack not modified > Entered OS INIT handler. PSP=ffe301a0 cpu=2 monarch=0 > Entered OS INIT handler. PSP=ffe301a0 cpu=1 monarch=1 > Delaying for 5 seconds... > Processes interrupted by INIT - 0 (cpu 1 task 0xe00000b47a4b8000) 0 (cpu 2 task 0xe00000b47a4e8000) 0 (cpu 3 task 0xe00000b47a500000) > >cpu 0 was running in user space during the first NMI, so the original >stack was not modified. On the second NMI, current for cpu 0 does not >match r13. Which means that something went wrong when processing the >first NMI while the process was in user space. > >I am still investigating this problem, but any other eyes on the code >would be appreciated. > >- >To unsubscribe from this list: send the line "unsubscribe linux-ia64" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Wed Apr 05 22:16:26 2006
This archive was generated by hypermail 2.1.8 : 2006-04-05 22:16:35 EST