Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)

From: Jay Lan <jlan_at_sgi.com>
Date: 2006-10-31 19:59:49
Zou, Nanhai wrote:
>> -----Original Message-----
>> From: Jay Lan [mailto:jlan@sgi.com]
>> Sent: 20061031 10:53
>> To: Zou, Nanhai
>> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
>> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>>
>> Zou, Nanhai wrote:
>>>> -----Original Message-----
>>>> From: Jay Lan [mailto:jlan@engr.sgi.com]
>>>> Sent: 2006A"10O^A^31E`O~ 4:36
>>>> To: fastboot
>>>> Cc: Linux-IA64; Zou, Nanhai; Jack Steiner; Luck, Tony
>>>> Subject: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>>>>
>>>> This patch is to fix a problem of interrupts being sent to cpus
>>>> that can not respond.
>>>>
>>>> This patch would return slave cpus to SAL slave loop, at time of
>>>> crash, except cpu0. The cpu0 is a special case as there is no way
>>>> to return it to SAL, so cpu0 is better handled in firmware.
>>>>
>>>> Signed-off-by: Jay Lan <jlan@sgi.com>
>>>>
>>>
>>> Does this fix the I/O interrupt redirect issue on SN?
>> This fixes the interrupts being sent to cpus not in the
>> slave loop that caused hang on SN. When one boots up the
>> kexec'ed kernel with 'maxcpus=1', all idle cpus needs to
>> be sent back. If they are not returned to the SAL slave
>> loop and just looping in cpu_relax(), they are considered
>> alive, but interrupts would be lost and system hang.
>>
> 
>  But this will rely on machine crash on CPU 0?

We do not rely on machine crash on CPU 0 any more. If the
crashing CPU is not cpu 0 and the cpu 0 not being returned to
the slave loop, this case is handled by our PROM now.

However, if somebody tries to boot up a production kernel using '-le'
option _after_ the kexec'ed kernel is up running, the third kernel
would not boot unless we boot up the second kernel with cpu 0. I
posted a question on "if running 'kexec -le' on a kexec'ed kdump
kernel is legal" earlier and Vivek responded saying the scenario
is not guranteed to work. So, i think we are fine here.


>  Current Kdump will boot to second kernel on the crashing CPU. 
>  So if machine crash and boot on CPU N, CPU 0 will still not be able to redirect interrupt, right?  

Yes, and this case is handled in our PROM.

> 
>> This is different from the kexec '--noio' option you added
>> to kexec-tools. We still need that fix.
>>
> 
> 
>  Does --noio patch works on SN? I remember you have mentioned there is still some issue when you testing --noio option on SN system?

We need the --noio option to have kexec-kdump working on SN. The problem
was the patch you posted. It was different from the suggestion you
gave me when we first encountered the problem. If we, as you first
suggested, noop all inline function defined in purgatory/arch/ia64/io.h,
then it works.

Is there any issue if the noio patch is changed to your original
suggestion?

Thanks,
 - jay


> 
>>> However this patch will make Kdump depends on cpu hotplug code, so you may
>> add the dependency in Kconfig.
>>
>> I thought Kahalid Aziz's patch covered this?
>> http://lists.osdl.org/mailman/htdig/fastboot/2006-October/004548.html
>>
> 
>> Thanks,
>>  - jay
>>
>>> Thanks
>>> Zou Nan hai
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Oct 31 19:59:21 2006

This archive was generated by hypermail 2.1.8 : 2006-10-31 19:59:30 EST