Re: [Patch]IA64 kexec

From: Khalid Aziz <khalid_aziz_at_hp.com>
Date: 2006-02-15 03:56:36
On Tue, 2006-02-14 at 13:06 +0900, Horms wrote:
> On Mon, Feb 13, 2006 at 09:26:58AM -0800, Luck, Tony wrote:
> > > Here is an as-yet untested forward port of the kexec-ia64 patch to
> > > today's Linus git tree (~2.6.16-rc3).
> > 
> > Thanks for taking a look at this ... I'm glad to see that there is
> > still interest in kexec.
> 
> Likewise.
> 
> In case anyone cares, my interest in kexec is twofold.
> Firstly the ia64 box I have takes a really long time to reboot,
> and it would be nice if kexec could trim that down to speed
> up my crash-and-burn development cycle.
> 
> But more importantly, I'm interested in using it for
> kdump functionality, hopefully in conjunction with Xen - 
> though as you can see, I haven't got that far yet.
> 
> > Khalid Aziz at HP is woking on merging the good parts of that patch
> > from Nan Hai with the kexec patch that he had produced earlier).  We
> > should see the results of that merge next week, & I hope to see
> > lots more commentary and testing this time around.
> 
> Awsome, I look forward to seeing it. Would I be right in thinking
> that it will show up on this list?

Yes, I will release my patch to this list later next week.

--
Khalid

> 
> > > I haven't looked into what other features have been added 
> > > to other arches kexec. Nor if the features above are applicable -
> > > seems that they probably are, exept that ia64 doesn't have NMI
> > > (right?) so the cpu shutdown would need to be done another way.
> > 
> > Nan Hai makes use of HOTPLUG_CPU to offline the other cpus ... which
> > in many ways is a very elegant solution (as it puts the cpus neatly
> > back into SAL ready for the new OS to bring it back online again).
> > But there are a couple of downsides:
> > 1) Requires CONFIG_HOTPLUG_CPU (perhaps this isn't really a big issue)
> 
> That isn't a particular concern to me. 
> 
> > 2) May run into trouble for kdump case where we'd like to rely on
> > less known state/code to get a good dump when the Linux kernel is
> > known to be in some unstable state.
> > 
> > The ia64 equivalent of NMI (large brick through the window) is INIT.
> > Some systems have a button on the front panel to generate INIT, or
> > have a maintenance processor that can send INIT.  So a good kdump
> > solution should eventually make use of INIT.
> > 
> > -Tony
> 
> On Tue, Feb 14, 2006 at 08:17:35AM +1100, Keith Owens wrote:
> > "Luck, Tony" (on Mon, 13 Feb 2006 09:26:58 -0800) wrote:
> > >The ia64 equivalent of NMI (large brick through the window) is INIT.
> > >Some systems have a button on the front panel to generate INIT, or
> > >have a maintenance processor that can send INIT.  So a good kdump
> > >solution should eventually make use of INIT.
> > 
> > Which raises a small problem.  As of about 2.6.15, INIT is a
> > recoverable event.  INIT _must_ be recoverable, because it can be sent
> > when an MCA occurs and one or more cpus was running with interrupts
> > disabled.  For example, when the cpu that takes the MCA owns a disabled
> > spinlock that other cpus are waiting on.  If INIT is not recoverable
> > then some MCAs that could be recovered also become unrecoverable, at
> > random.
> > 
> > Since INIT is recoverable, pressing NMI gives you a stack trace for
> > each cpu, then the system resumes.  This allows a user to see if the
> > system is making progress, albeit slowly, or if it really is stuck.
> > The downside of a recoverable INIT is that you cannot use it to take a
> > dump, or at least not the first time that NMI is issued.  Unfortunately
> > there is no way to distinguish between an NMI where the user wants to
> > see what the system is doing and an NMI to take a dump.  Nobody has
> > implemented the "Read Programmer's Mind" instruction yet.
> 
> I sense pain. Looking over the code - very naievely - would it be
> possible to register an alternate INIT handler when kexecing.
> 
> What I'm getting at is ia64_os_init_dispatch_monarch and
> ia64_os_init_dispatch_slave are basically the same, but r19
> is set so the code knows which variant is running for the core that
> cares. I wonder if an aditional bit in r19 could be used by
> alternate handlers that are registered when kexec wants to shut
> down the cpus.
> 
> Off course, this assume that reregistering handlers is possible,
> which is where the "naieve" bit comes in.
> 
-- 
====================================================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Feb 15 03:57:50 2006

This archive was generated by hypermail 2.1.8 : 2006-02-15 03:58:00 EST