ia64 ORDERROUNDDOWN issue

From: xb <xavier.bru_at_bull.net>
Date: 2006-11-30 01:57:22
Hello all,

On some ia64 NUMA platforms with some specific memory configurations, 
the 2.6.18.3 kernel crashes at system initialisation due to conflict for 
allocating DMA memory.
The machine has the following memory configuration:

physical address       length      node
0                      2GB         0
4GB                    4GB         1
8GB                    2GB         0

We use 64 KB pages and the default CONFIG_FORCE_MAX_ZONEORDER=17 value, 
that provides the availability to use 4GB huge pages   ( 2^(17-1)*2^16 B).

After some investigations I stated that count_node_pages() was computing 
mem_data[1].min_pfn = 0, and mem_data[1].max_pfn = 20000 for node 1, 
thus conflicting with the 0-2GB DMA memory range on node 0.
This is due to the line:
    start = ORDERROUNDDOWN(start);
that computes the value 0 for the 0x100000000 (4GB) address.
I suppose the goal was to check that the memory range is aligned on a 
4GB boundary ( 2^(17-1)*2^16 Bytes), and in our case there should be no 
round of ht value.
I fixed the ORDERROUNDDOWN macro and system boots OK.

It is not sure that this fixes the problem in all cases: with a 
CONFIG_FORCE_MAX_ZONEORDER=18 value, the ORDERROUNDDOWN macro would have 
generated the same problem (mem_data[1].min_pfn=0). This should at least 
be checked in the count_node_pages() function.

--- linux-2.6.18.3/include/asm-ia64/meminit.h    2006-11-19 
04:28:22.000000000 +0100
+++ linux-2.6.18.3new/include/asm-ia64/meminit.h    2006-11-29 
15:23:37.000000000 +0100
@@ -40,7 +40,7 @@
  */
 #define GRANULEROUNDDOWN(n)    ((n) & ~(IA64_GRANULE_SIZE-1))
 #define GRANULEROUNDUP(n)    (((n)+IA64_GRANULE_SIZE-1) & 
~(IA64_GRANULE_SIZE-1))
-#define ORDERROUNDDOWN(n)    ((n) & ~((PAGE_SIZE<<MAX_ORDER)-1))
+#define ORDERROUNDDOWN(n)    ((n) & ~((PAGE_SIZE<<(MAX_ORDER-1))-1))
 
 #ifdef CONFIG_NUMA
   extern void call_pernode_memory (unsigned long start, unsigned long 
len, void *func);

- traces 
---------------------------------------------------------------------------------------------------
all_unreclaimable? no
lowmem_reserve[]: 0 0 256 256Linux version 2.6.18.3
...
SRAT Memory (0x0000000000000000 length 0x0000000080000000 type 0x0) in 
proximity domain 0 enabled
SRAT Memory (0x0000000200000000 length 0x0000000080000000 type 0x0) in 
proximity domain 0 enabled
SRAT Memory (0x0000000100000000 length 0x0000000100000000 type 0x0) in 
proximity domain 1 enabled
Number of logical nodes in system = 2
Number of memory chunks in system = 3
...

    ide0: BM-DMA at 0x2080-0x2087<4>swapper: page allocation failure. 
order:0, mode:0x21

Call Trace:
 [<a000000100010c30>] show_stack+0x50/0xa0
                                sp=e000000100cdfbf0 bsp=e000000100cd13c8
 [<a000000100010cb0>] dump_stack+0x30/0x60
                                sp=e000000100cdfdc0 bsp=e000000100cd13b0
 [<a0000001000e5f00>] __alloc_pages+0x500/0x540
                                sp=e000000100cdfdc0 bsp=e000000100cd1348
 [<a000000100119830>] alloc_page_interleave+0xd0/0x160
                                sp=e000000100cdfdd0 bsp=e000000100cd1318
 [<a0000001001199f0>] alloc_pages_current+0x130/0x1a0
                                sp=e000000100cdfdd0 bsp=e000000100cd12e8
 [<a0000001000e5f70>] __get_free_pages+0x30/0x100
                                sp=e000000100cdfdd0 bsp=e000000100cd12c0
 [<a0000001003369b0>] swiotlb_alloc_coherent+0x70/0x280
                                sp=e000000100cdfdd0 bsp=e000000100cd1280
 [<a00000010044e510>] ide_setup_dma+0x430/0x8c0
                                sp=e000000100cdfdd0 bsp=e000000100cd1240
 [<a00000010044b2c0>] ide_pci_setup_ports+0xd60/0xea0
                                sp=e000000100cdfdd0 bsp=e000000100cd11a8
 [<a00000010044bc00>] do_ide_setup_pci_device+0x800/0x840
                                sp=e000000100cdfde0 bsp=e000000100cd1138
 [<a00000010044bc80>] ide_setup_pci_device+0x40/0x140
                                sp=e000000100cdfdf0 bsp=e000000100cd1100
 [<a0000001004322d0>] piix_init_one+0x50/0x80
                                sp=e000000100cdfe00 bsp=e000000100cd10d8
 [<a0000001006c8230>] ide_scan_pcidev+0xf0/0x180
                                sp=e000000100cdfe00 bsp=e000000100cd10a8
 [<a0000001006c8300>] ide_scan_pcibus+0x40/0x1e0
                                sp=e000000100cdfe00 bsp=e000000100cd1080
 [<a0000001006c8110>] ide_init+0xb0/0xe0
                                sp=e000000100cdfe00 bsp=e000000100cd1060
 [<a000000100009640>] init+0x380/0x7a0
                                sp=e000000100cdfe00 bsp=e000000100cd1020
 [<a000000100012dd0>] kernel_thread_helper+0xd0/0x100
                                sp=e000000100cdfe30 bsp=e000000100cd0ff0
 [<a000000100009140>] start_kernel_thread+0x20/0x40
                                sp=e000000100cdfe30 bsp=e000000100cd0ff0
Mem-info:

Node 0 DMA free:0kB min:2688kB low:3328kB high:4032kB active:0kB 
inactive:0kB present:1943936kB pages_scanned:0 all_unreclaimable? yes
Node 1 DMA free:1876352kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Thu Nov 30 01:58:20 2006

This archive was generated by hypermail 2.1.8 : 2006-11-30 01:58:46 EST