Re: Externalize SLIT table

From: Andi Kleen <ak_at_suse.de>
Date: 2004-11-05 01:29:55
On Thu, Nov 04, 2004 at 08:13:37AM -0600, Jack Steiner wrote:
> On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> > Hi,
> > 
> > For wider audience, added LKML.
> > 
> > From: Jack Steiner <steiner@sgi.com>
> > Subject: Externalize SLIT table
> > Date: Wed, 3 Nov 2004 14:56:56 -0600
> > 
> > > The SLIT table provides useful information on internode
> > > distances. Has anyone considered externalizing this
> > > table via /proc or some equivalent mechanism.
> > > 
> > > For example, something like the following would be useful:
> > > 
> > > 	# cat /proc/acpi/slit
> > > 	010 066 046 066
> > > 	066 010 066 046
> > > 	046 066 010 020
> > > 	066 046 020 010
> > > 
> > > If this looks ok (or something equivalent), I'll generate a patch....
> > 
> > For user space to manipulate scheduling domains, pinning processes
> > to some cpu groups etc, that kind of information is very useful!
> > Without this, users have no notion about how far between two nodes.
> > 
> > But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> > user-visible logical number and ACPI proximity domain number is
> > not always identical.
> > 
> > Why not export node_distance() under sysfs?
> > I like (1).
> > 
> > (1) obey one-value-per-file sysfs principle
> > 
> > % cat /sys/devices/system/node/node0/distance0
> > 10
> > % cat /sys/devices/system/node/node0/distance1
> > 66
> 
> I'm not familar with the internals of sysfs. For example, on a 256 node
> system, there will be 65536 instances of
> 	 /sys/devices/system/node/node<M>/distance<N>
> 
> Does this require any significant amount of kernel resources to
> maintain this amount of information.

Yes it does, even with the new sysfs backing store. And reading
it would create all the inodes and dentries, which are quite
bloated.

> 
> I think it would also be useful to have a similar cpu-to-cpu distance
> metric:
> 	% cat /sys/devices/system/cpu/cpu0/distance
> 	10 20 40 60 
> 
> This gives the same information but is cpu-centric rather than
> node centric.


And the same thing for PCI busses, like in this patch. However
for strict ACPI systems this information would need to be gotten
from _PXM first. x86-64 on Opteron currently reads it directly
from the hardware and uses it to allocate DMA memory near the device.

-Andi


diff -urpN -X ../KDIFX linux-2.6.8rc3/drivers/pci/pci-sysfs.c linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c
--- linux-2.6.8rc3/drivers/pci/pci-sysfs.c	2004-07-27 14:44:10.000000000 +0200
+++ linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c	2004-08-04 02:42:11.000000000 +0200
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
+#include <linux/topology.h>
 
 #include "pci.h"
 
@@ -38,6 +39,15 @@ pci_config_attr(subsystem_device, "0x%04
 pci_config_attr(class, "0x%06x\n");
 pci_config_attr(irq, "%u\n");
 
+static ssize_t local_cpus_show(struct device *dev, char *buf)
+{		
+	struct pci_dev *pdev = to_pci_dev(dev);
+	cpumask_t mask = pcibus_to_cpumask(pdev->bus->number);
+	int len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
+	strcat(buf,"\n"); 
+	return 1+len;
+}
+
 /* show resources */
 static ssize_t
 resource_show(struct device * dev, char * buf)
@@ -67,6 +77,7 @@ struct device_attribute pci_dev_attrs[] 
 	__ATTR_RO(subsystem_device),
 	__ATTR_RO(class),
 	__ATTR_RO(irq),
+	__ATTR_RO(local_cpus),
 	__ATTR_NULL,
 };
 


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Nov 4 09:37:29 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:32 EST