Re: [Perfctr-devel] Re: quick overview of the perfmon2 interface

From: David Gibson <>
Date: 2005-12-21 10:52:45
On Tue, Dec 20, 2005 at 02:51:56AM -0800, Andrew Morton wrote:
> Stephane Eranian <> wrote:
> > 	or create full sampling measurements.
> > ...
> > 
> > 	Sampling is supported at the user level and also at the kernel level with
> > 	the ability to use a kernel-level sampling buffer. The format of the kernel
> > 	sampling buffer is implemented by a kernel pluggable module. As such it is
> > 	very easy to develop a custom format without any modification to the
> > 	interface nor its implementation.
> Why would one want to change the format of the sampling buffer?
> Would much simplification be realised if we were to remove this option?
> > 	To compensate for limitations of many PMU, such as a small number of
> > 	counters, the interface also exports the notion of event sets and allows
> > 	applications to multiplex sets, thereby allowing more events to be
> > 	measured in a single run than there are actual counters.
> > 
> > 	The PMU register description is implemented by a kernel pluggable module
> > 	thereby allowing new hardware to be supported without the need to wait
> > 	for the next release of the kernel.
> Is that option important, or likely to be useful?  Are you sure there isn't
> some overdesign here?

Heh, I've been wondering that for some time.  It's better than it

> > 	- pfm_load_context(int fd, pfarg_load_t *load)
> > 
> > 	   Attach a perfmon context to either a thread or a processor. In the
> > 	   case of a thread, the thread id is passed. In the case of a
> > 	   processor, the context is bound to the CPU on which the call is
> > 	   performed.
> Why should userspace concern itself with a particular CPU?  That's really
> only valid if the process has bound itself to a single CPU?  If the CPU is
> fully virtuialised by perfmon (it is) then why do we care about individual
> CPU instances?

This option is for a context monitoring the whole system, rather than
a single thread, which you sometimes want to do (because one thread's
actions could have an impact on others, for example).  But a context
can't sensibly track more than one thing simultaneously executing, so
it needs to be bound to a particular CPU.  In practice such a context
would almost invariably be used as part of a 1-per-cpu set of contexts
the results ultimately aggregated.

> > 	A system-wide context allows a tool to monitor all activities on one
> > 	processor, i.e, across all threads. Full System-wide monitoring in an
> > 	SMP system is achieved by creating and binding a perfmon context on
> > 	each processor. By construction, a perfmon context can only be bound
> > 	to one processor at a time.  This design choice is motivated by the
> > 	desire to enforce locality and to simplify the kernel implementation.
> hm.  I'm surprised at such a CPU-centric approach.  I'd have expected to
> see a more task-centric model.

It is task-centric, usually.  The CPU binding is only when doing
full-system monitoring.  The above is a bit unclear, giving undue
emphasis to the per-CPU case: the point is that (from the kernel's
point of view) a context can only ever be active on one CPU at a time.
For a thread-virtualized context (the common case) that happens
automatically because the thread can't run on multiple CPUs at once,
for a full system monitoring context the binding must be explicit instead.

> > 	This is not such a new idea, it is present in OProfile or perfctr.
> > 	However, the interface needs to remains generic and flexible. If
> > 	the sampling buffer is in kernel, its format and what gets recorded
> > 	becomes somehow locked by the implementation. Every tool has different
> > 	needs. For instance, a tool such as Oprofile may want to aggregate
> > 	samples in the kernel, others such as VTUNE want to record all samples
> > 	sequentially. Furthermore, some tools may want to combine in each sample
> > 	PMU information with other kernel level information, such as a kernel
> > 	call stack for instance. It is hard to design a generic buffer format
> > 	that can handle all possible request. Instead, the interface provides
> > 	an infrastructure  in which the buffer format is implemented by a kernel
> > 	module. Each module controls, what gets recorded, how it is recorded,
> > 	how the information is exported to user, when a 'buffer full'
> > 	notification must be sent. The perfmon core has an interface to
> > 	dynamically register new formats. Each format is uniquely identified by
> > 	a 128-bit UUID which is passed by the tool when the context is created.
> > 	Arguments for the buffer format are also passed during this call.
> Well that addresses my earlier questions I guess.
> Is this actually useful?  oprofile is there and works OK.  Again, is there
> overdesign here?
> And why is it necessary to make the presentation of the samples to
> userspace pluggable?  I'd have thought that a single relayfs-based
> implementation would suit all sampling buffer formats.

I think there's some confusion here because the term "pluggable buffer
format" isn't a particularly good one.  It makes sense in the context
of perfmon's internal architecture, but doesn't really convey the
basic idea.

As I understand it from working with the perfmon code a bit, the point
is not so much the buffer itself, but the selection of what things to
sample when and how.  The default format gives a fairly configurable
set of ways to do this but for some applications you'd have a choice
of either not collecting all the data you need, or collecting so much
extra data that it would bog down trying to save and/or aggregate it
all.  So, the point of "custom buffer formats" is more a matter of
being able to add new (potentially application specific) sampling

As you say, relayfs should be fine for the actual presentation of the
buffer to userspace, whatever it contains.

> Overall: I worry about excessive configurability, excessive
> features.

Join the club :)

David Gibson			| I'll have my music baroque, and my code
david AT	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Wed Dec 21 10:53:44 2005

This archive was generated by hypermail 2.1.8 : 2005-12-21 10:53:50 EST