RE: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump

From: Mel Gorman <mel_at_csn.ul.ie>
Date: 2006-11-16 03:26:59
On Wed, 15 Nov 2006, Zou, Nanhai wrote:

>> -----Original Message-----
>> From: Mel Gorman [mailto:mel@csn.ul.ie]
>> Sent: 2006年11月15日 7:42
>> To: Zou, Nanhai
>> Cc: Horms; Andy Whitcroft; Linux-IA64; Bob Picco; Andrew Morton; Dave Hansen;
>> Andi Kleen; Benjamin Herrenschmidt; Paul Mackerras; Keith Mannthey; Luck, Tony;
>> KAMEZAWA Hiroyuki; Yasunori Goto; Khalid Aziz
>> Subject: RE: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>
>> On Tue, 14 Nov 2006, Zou, Nanhai wrote:
>>
>>>> -----Original Message-----
>>>> From: linux-ia64-owner@vger.kernel.org
>>>> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Mel Gorman
>>>> Sent: 2006Äê11ÔÂ10ÈÕ 19:47
>>>> To: Zou, Nanhai
>>>> Cc: Horms; Andy Whitcroft; Linux-IA64; Bob Picco; Andrew Morton; Dave
>> Hansen;
>>>> Andi Kleen; Benjamin Herrenschmidt; Paul Mackerras; Keith Mannthey; Luck,
>> Tony;
>>>> KAMEZAWA Hiroyuki; Yasunori Goto; Khalid Aziz
>>>> Subject: RE: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>>>
>>>> On Fri, 10 Nov 2006, Zou, Nanhai wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: linux-ia64-owner@vger.kernel.org
>>>>>> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Zou Nan hai
>>>>>> Sent: 2006Äê11ÔÂ3ÈÕ 18:07
>>>>>> To: Mel Gorman
>>>>>> Cc: Horms; Andy Whitcroft; Linux-IA64; Bob Picco; Andrew Morton; Dave
>>>> Hansen;
>>>>>> Andi Kleen; Benjamin Herrenschmidt; Paul Mackerras; Keith Mannthey; Luck,
>>>> Tony;
>>>>>> KAMEZAWA Hiroyuki; Yasunori Goto; Khalid Aziz
>>>>>> Subject: RE: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>>>>>
>>>>>> On Fri, 2006-11-03 at 17:27, Mel Gorman wrote:
>>>>>>> On Fri, 3 Nov 2006, Zou, Nanhai wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> 	This patch should fix the issue.
>>>>>>>>
>>>>>>>
>>>>>>> It would appear to fix the issue for IA64 but you are blotting over the
>>>>>>> issue that the map is reporting a one page hole. On arches with really
>>>>>>> adjacent regions that are getting merged, the regions will appear to
>>>>>>> overlap by one page. What can happen is something like this
>>>>>>>
>>>>>>> PFN ranges for nodes
>>>>>>> Node 1: 0 -> 1000
>>>>>>> Node 0: 1000 -> 2000
>>>>>>>
>>>>>> Hi,
>>>>>>  But the patch Andy and you are commenting is not my patch...., It was
>>>>>> in the previous thread.
>>>>>> My patch was in the attachment.....
>>>>>>
>>>>>>  Sorry for using outlook to send that patch as attachment, my Linux box
>>>>>> was not accessable at the time when I was posting the patch.
>>>>>>  I post the patch again, and copy the discription from my previous mail.
>>>>>>
>>>>>> When ia64 kernel is configured as discontinuous memory model,
>>>>>> active_pages are added through efi_memmap_walk(filter_rsvd_memory,
>>>>>> count_node_pages).
>>>>>> filter_rsvd_memory  will filter out all regions in rsvd_regions include
>>>>>> - boot param
>>>>>> - mem map
>>>>>> - initrd
>>>>>> - command line
>>>>>> - **** kernel code and data ***
>>>>>> - kernel map built from efi memmap
>>>>>> - crash kernel reserved region
>>>>>> So the kernel code and data is excluded even without kdump support,
>>>>>> check /proc/iomem and dmesg for early_node_data can verify that.
>>>>>> But magically, the first kernel boots happily without any complain...,
>>>>>> I guess that is related to the init value in memmap.
>>>>>>
>>>>>> This patch use another filter to add_acvitive_pages, only exclude crash
>>>> kernel
>>>>>> reserved region if CONFIG_KEXEC is on.
>>>>>>
>>>>>> Thanks
>>>>>> Zou Nan hai
>>>>>> --- a/arch/ia64/mm/discontig.c	2006-11-02 20:09:47.000000000 -0500
>>>>>> +++ b/arch/ia64/mm/discontig.c	2006-11-02 19:57:27.000000000 -0500
>>>>>> @@ -21,6 +21,7 @@
>>>>>>  #include <linux/acpi.h>
>>>>>>  #include <linux/efi.h>
>>>>>>  #include <linux/nodemask.h>
>>>>>> +#include <linux/kexec.h>
>>>>>>  #include <asm/pgalloc.h>
>>>>>>  #include <asm/tlb.h>
>>>>>>  #include <asm/meminit.h>
>>>>>> @@ -653,8 +654,6 @@ void call_pernode_memory(unsigned long s
>>>>>>  static __init int count_node_pages(unsigned long start, unsigned long
>> len,
>>>>>> int node)
>>>>>>  {
>>>>>>  	unsigned long end = start + len;
>>>>>> -
>>>>>> -	add_active_range(node, start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>>>>>  	mem_data[node].num_physpages += len >> PAGE_SHIFT;
>>>>>>  	if (start <= __pa(MAX_DMA_ADDRESS))
>>>>>>  		mem_data[node].num_dma_physpages +=
>>>>>> @@ -669,7 +668,31 @@ static __init int count_node_pages(unsig
>>>>>>
>>>>>>  	return 0;
>>>>>>  }
>>>>>> +static __init int add_active_range_wrapper(unsigned long start,
>>>>>> +		unsigned long len, int node)
>>>>>> +{
>>>>>> +	unsigned long end = start + len;
>>>>>> +	add_active_range(node, start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>>>>> +	return 0;
>>>>>> +}
>>>>>>
>>>>
>>>> The function name doesn't really tell the reader what it's meant to be
>>>> doing. Something like register_active_ranges() might be a bit better.
>>>>
>>> Ok.
>>>>>> +static int __init
>>>>>> +filter_pernode_memory (unsigned long start, unsigned long end, void
>> *arg)
>>>>>> +{
>>>>>> +	void (*func)(unsigned long, unsigned long, int);
>>>>>> +	func = arg;
>>>>>> +
>>>>>> +#ifdef CONFIG_KEXEC
>>>>>> +	if (start > crashk_res.start && start < crashk_res.end)
>>>>>> +		start = max(start, crashk_res.end);
>>>>>> +	if (end > crashk_res.start && end < crashk_res.end)
>>>>>> +		end = min(end, crashk_res.start);
>>>>
>>>>
>>>> These two checks appear to deliberatly avoid registering the kernel image
>>>> as an active range. Was that your intention? If so, will you not hit the
>>>> same problem with initmem?
>>>>
>>>  No, the crashk_res.start ~ crashk_res.end is the hole reserved for 2nd
>>> kernel.
>>
>> Then it needs a comment to that effect. It's difficult to see what code is
>> executed by the main kernel and what code is executed by the crash kernel.
>>
>>> The kernel himself does not to setup memmap for this area, the
>>> 2nd kernel will handle it.
>>
>> Ok, where does that happen?
>>
>  Ok, I need some explain of how kdump works here...,
> The first kernel leaves a big enough hole, he will not touch the memory 
> in the hole once we have loaded crash dump kernel into the hole. Usually 
> we put an exactly same kernel in that hole. But from first kernel's 
> point of view, he does not know anything about the second kernel except 
> an entry point. When crash happen, first kernel quickly shutdown the 
> machine then jump to the entry point. The second kernel will limit its 
> memory access in that hole expect copy crash dump data from first 
> kernel's memory range. So this will happen at the second kernel boot 
> time, the first kernel does not need memory map for the crash area.
>

Ok.

>>> As I have mentioned, this bug also exist even
>>> without kdump patch. You will see first kernels code and data is not
>>> covered by add_active_range if DISCONTIGMEM model is choosen.
>>>
>>
>> But is it's initmem section?
>>
>
>  Yes, initmem section is inside. Please check 
> arch/ia64/kernel/vmlinux.lds.S the add_active_range is called by a 
> efi_memmap_walk(filter_rsvd_memory, count_node_pages); 
> filter_rsvd_memory will exclude everything inside rsvd_region, kernel 
> code & data is in rsvd_region, please check include/asm-ia64/meminit.h
>

As you say, it's not clear why the normal discontig kernel boots because 
the regions should have been skipped by add_active_range().

Try your patch and see does it work for kdump. It should work fine in the 
normal case because at very worst, slightly more memmap is allocated than 
is strictly required.

> Thanks
> Zou Nan hai
>
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Nov 16 03:28:12 2006

This archive was generated by hypermail 2.1.8 : 2006-11-16 03:28:35 EST