Re: [PATCH] discontig patch (work in progress)

From: Christoph Hellwig <>
Date: 2003-09-24 18:43:57
On Tue, Sep 23, 2003 at 06:26:13PM -0700, Jesse Barnes wrote:
> On Tue, Sep 23, 2003 at 04:56:40PM -0700, Jesse Barnes wrote:
> > done, like removing the shouting from GRANULEROUND* and naming nodeid()
> > something better (maybe numa_node_id()?).
> How about I just nuke it since numa_node_id() is already defined in
> include/linux/mmzone.h?  Here's an updated version.  And even though
> I've left the #if defined(VIRTUAL_MEM_MAP) || !defined(DISCONTIGMEM) in
> the ia64 files as opposed to removing them altogether since (1) it's
> more consistent with the rest of the tree that way and (2) I'd like to
> remove that dependency so we can measure the perf. impact of virtual
> memmap on discontig machines like David said he wanted to do for zx1.

The #if defined(VIRTUAL_MEM_MAP) || !defined(DISCONTIGMEM) in generic
code have to go away.  All this mem_map/contig_page_data/etc crap
has should probably go away some day, but for now let's not make it
even messier.

Also in the discontig + vmem_map case you don't want them - always use
the per-node mem_maps even if it's just to avoid the pagetable
lookups and to be more similar to the other arches numa code.

More comments:

-                       call_pernode_memory(__pa(range_start), __pa(range_end), func);
+			call_pernode_memory(range_start, range_end, arg);
-                       (*func)(__pa(range_start), range_end - range_start);
+			(*func)(range_start, range_end, 0);

What's the point of passing arg directly here if we just casted it to
func?  Also the ifdef is horrible.  Please add a call_pernode_memory
wrapper for !CONFIG_DISCONTIGMEM ala

#define call_pernode_memory(start, end, func)	(*func)(start, end, 0)

What about moving call_pernode_memory to discontig.c?


count_cpus() seems to reimplemement nr_cpus_node() from topology.h
badly, or is it just me?


per_cpu_init looks strange, in the !SMP case both implementations
are identical and have an unused variable..

What about just having

extern void *per_cpu_init(void);
# define per_cpu_init()	(__phys_per_cpu_start)

into percpu.h?

This whole per-cpu thing looks like a candidate for the next small patch.


+		if (numnodes == 1 && max_gap < LARGE_GAP) {
+			/* Just one node with no big holes... */
+			vmem_map = (struct page *)0;
+			zones_size[ZONE_DMA] += cdata.min_pfn;
+			zholes_size[ZONE_DMA] += cdata.min_pfn;
+			free_area_init_node(0, NODE_DATA(node), NODE_DATA(node)->node_mem_map,
+					    zones_size, 0, zholes_size);
+		}
+		else {
+			/* allocate virtual mem_map */
+			if (node == 0) {
+				unsigned long map_size;
+				map_size = PAGE_ALIGN(max_low_pfn*sizeof(struct page));
+				vmalloc_end -= map_size;
+				vmem_map = (struct page *) vmalloc_end;
+				efi_memmap_walk(create_mem_map_page_table, 0);
+				printk("Virtual mem_map starts at 0x%p\n", vmem_map);
+				mem_map = vmem_map;
+			}
+			free_area_init_node(node, NODE_DATA(node), vmem_map + cdata.min_pfn,
+					    zones_size, cdata.min_pfn, zholes_size);
+		}

Should look something like

		/* Just one node with no big holes... */
		if (numnodes == 1 && max_gap < LARGE_GAP) {
			zones_size[ZONE_DMA] += cdata.min_pfn;
			zholes_size[ZONE_DMA] += cdata.min_pfn;

			/* XXX: probably already done somewhere else? */
			mem_map = NODE_DATA(node)->node_mem_map; 
			pfn_offset = 0;
		/* allocate virtual mem_map */
		} else if (node == 0) {
			vmalloc_end -=
				PAGE_ALIGN(max_low_pfn * sizeof(struct page));
			mem_map = vmem_map = (struct page *)vmalloc_end;

			efi_memmap_walk(create_mem_map_page_table, 0);
			pfn_offset = cdata.min_pfn;

		free_area_init_node(node, NODE_DATA(node), mem_map + pfn_offset,
				zones_size, pfn_offset, zholes_size);


-                       pgd_populate(&init_mm, pgd, alloc_bootmem_pages(PAGE_SIZE));
+			pgd_populate(&init_mm, pgd, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));

This could use some linewraps :)  Also alloc_bootmem_pages_node probably
wants a nid argument instead of of a pgdat, but that's not really in
scope for this series..


asm/mmzone.h looks a bit fishy.  The SN2 and generic cases are the same,
why not merge them.  Also the £error is ugly - it if works for the generic
kernel it'll surely work for a newly added arch, too, no?

What about something like:

/* DIG systems only support rather small configurations for now */
#ifdef CONFIG_IA64_DIG
#define NR_NODES	8
#define NR_MEMBLKS	(NR_NODES * 32) /* interleaved, contiguous memory */
/* Currently SN2 marks the maximum.  Should work for everyone else, too */
#define MAX_PHYSNODE_ID	2048
#define NR_NODES	256

And as asm/mmzone.h is only included for CONFIG_DISCONTIGMEM and
ia64 couples DISCONTIG and NUMA tighly you should just scrap the non-numa
case in this file


    extern int ia64_pfn_valid (unsigned long pfn);
 #  define pfn_valid(pfn)       (((pfn) < max_mapnr) &&
 #  ia64_pfn_valid(pfn))
-# else
 #  define pfn_valid(pfn)       ((pfn) < max_mapnr)
-# endif

Didn't I tell you some time ago that the proper way to write this
would be:

extern int ia64_pfn_valid(unsigned long pfn);
# define ia64_pfn_valid(pfn)	1

and then an unconditional

#define pfn_valid(pfn)		(((pfn) < max_mapnr) && ia64_pfn_valid(pfn))

To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Wed Sep 24 04:44:39 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:17 EST