Re: [Fastboot] Ia64 kdump patch

From: Zou Nan hai <nanhai.zou_at_intel.com>
Date: 2006-07-28 07:23:41
On Fri, 2006-07-28 at 05:41, Jay Lan wrote:
> Hi,
> 
> I applied the patch to 2.6.18-rc2. However, compilation failed
> at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
> an sn2 machine.
> 
> It was easy to figure out irq_descp() is gone and idesc->handle
> is replaced with idesc->chip. But this code in machine_shutdown()
> caused an error:
> 
> ...
> if (cpu != smp_processor_id())
> cpu_down(cpu);
> }
> }
> #elif defined(CONFIG_SMP)
> smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <===
> #endif
> 
> 'image' is undefined in the code. Was it a global? Where was it
> declared?
> 
> Thanks,
> - jay
> 
  Hi, can you try if it works with CONFIG_HOTPLUG_CPU enabled?
  Thanks
  Zou Nan hai
> 
> Zou, Nanhai wrote:
> >>-----Original Message-----
> >>From: Horms [mailto:horms@verge.net.au]
> >>Sent: 2006年6月26日 15:47
> >>To: Zou, Nanhai
> >>Cc: Linux-IA64; khalid_aziz@hp.com; fastboot@lists.osdl.org
> >>Subject: Re: [Fastboot] Ia64 kdump patch
> >>
> >>On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote:
> >>    
> >>>On Thu, 2006-06-08 at 16:35, Horms wrote:
> >>>      
> >>>>On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> >>>>        
> >>>>>The ia64 kdump patch is in 2 parts.
> >>>>>
> >>>>>the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous
> >>>>>kexec patch by Khalid in Tony's test tree.
> >>>>>
> >>>>>the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> >>>>>with kexec-tools-1.101-kdump.patch
> >>>>>
> >>>>>
> >>>>>To test it.
> >>>>>Build first SMP kernel with KEXEC and KDUMP enabled.
> >>>>>
> >>>>>Boot it with kernel parameter "crashkernel=XXX@YYY"
> >>>>>means reserver XXX from YYY for crashdumping.
> >>>>>Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> >>>>>load this kernel as a crashdumping kernel
> >>>>>kexec -p vmlinux.gz --initrd=initrd --append="...."
> >>>>>
> >>>>>trigger a crash,
> >>>>>maybe "echo c > /proc/sysrq-trigger"
> >>>>>after the crash kernel boots,
> >>>>>cp /proc/vmcore core
> >>>>>
> >>>>>gdb first_kernel_vmlinux core
> >>>>>
> >>>>>please test and review.
> >>>>>
> >>>>>Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> >>>>>Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
> >>>>>          
> >>>>Hi,
> >>>>
> >>>>I'm very excited to be able to play with the new version of this patch,
> >>>>but the version you posted seems to included include all the kexec patch
> >>>>that went into Tony Luck's tree. Here is a rediff relative to the
> >>>>existing kexec patch (no other changes).
> >>>>
> >>>>The code does seem to be working for me. The main difficulty so far
> >>>>seems to have been finding an appropriate place and size and place for
> >>>>the reserved area. 128M@256M seems to work for me, offering enough
> >>>>memory and not lie on a resource boundry for me.
> >>>>
> >>>>Lastly, is it possible for you to comment on what areas of concern
> >>>>you have with regards to kdump/kexec on ia64. I am looking to port this
> >>>>code to xen, as my colleague Magnus Damm and I have already done so for
> >>>>        
> >>i386
> >>    
> >>>>(complete) and x86_64 (almost complete).
> >>>>
> >>>>
> >>>>        
> >>http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
> >>    
> >>>>Signed-Off-By: Horms <horms@verge.net.au>
> >>>>
> >>>>        
> >>> Thanks for testing and review.
> >>>
> >>> There is still a lot of work to do for ia64 Kdump to be a very useful
> >>>and robust feature.
> >>>
> >>> Major issues.
> >>> 1. Full percpu dumping on INIT.
> >>>    You may notices I only send an IPI to user CPUs and dump part of
> >>>registers for crashing CPU.Just stop other CPUs, not dumping their
> >>>status. This is only a temp hack.
> >>> On other platforms they did this by an NMI, on IA64 we should use INIT
> >>>to acknowledge other CPUs. And I know on some platform there is a
> >>>trigger on panel can trigger INIT. We could use that to dump at the time
> >>>of deadlock. But currently INIT is used by MCA, we need to find a way to
> >>>coordinate with MAC on INIT.
> >>>
> >>> 2. unwind section is missing in vmcore.
> >>>    When you do a readelf on vmcore, you may notice there is no unwind
> >>>sections. We should add this percpu stack unwind sections to help dump
> >>>filter tools to analize the core dump.
> >>>
> >>> 3. kdump path at crash time.
> >>>    Currently I still have to do a irq->end on each level triggered irq,
> >>>without that the MPT fusion driver can not restart. We should fix this,
> >>>at least do that in a way of not touching any memory in previous kernel.
> >>>
> >>> 4. Other than this, we need port the dump filter to IA64.
> >>>
> >>>There are still some minor issues.
> >>>e.g
> >>>  When I get a crash when X is active, the new kernel will startup in a
> >>>blank screen(network is still working). I have indeed do a brute force
> >>>VGA reset on in purgatory code. But that seems to only shutdown the VGA
> >>>but not reinit it if X is running.
> >>>
> >>>  Current kexec can't not run on a kexec'd kernel, that is because the
> >>>memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
> >>>a patch to reserve that region later.
> >>>
> >>>There should be other issues and gaps need to find out.
> >>>      
> >>Thanks for that list, it is very useful to me. I hope that I can
> >>find some time to help with some of those problems.
> >>
> >>One thing that I am puzzling over is why you shutdown the PCI devices
> >>as part of machine_crash_shutdown(). As I am trying to port your code
> >>to xen this is quite a problem for me, as I'm not sure that Xen
> >>actually knows enough about PCI to do this. Its it a problem relating
> >>to bringing the devices back online after a reboot? Is it the MPT fusion
> >>problem you mention above?
> >>
> >>    
> >  The list is a bit wrong.., I notice that we don't need to dump unwind segment to core file for stack unwind to work... I am working on full register dumping and fixing the stack unwind issue.
> >
> > The PCI device shutdown code was to un-master all the PCI devices so that no DMA transaction will be issued by Device. However I think maybe we can remove this code because the new kernel memory space is invisible to first kernel.
> >
> >There is another problem that I call irq->end for each devices, it is not safe to touch any pointer belong to previous kernel at the crash time.
> >But without this code, MPT fusion driver is very likely unable to restart. It sometimes failed to restart even with the irq->end code. This is an open issue need to be fixed.
> >
> >Thanks
> >Zou Nan hai
> >
> >  
> >>Horms
> >>H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/
> >>    
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >  
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Fri Jul 28 09:12:09 2006

This archive was generated by hypermail 2.1.8 : 2006-07-28 09:12:19 EST