~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~ [ freetext search ] ~ [ file search ] ~

Linux Cross Reference
Linux-2.6.17/Documentation/filesystems/relayfs.txt

Version: ~ [ 2.6.16 ] ~ [ 2.6.17 ] ~
Architecture: ~ [ ia64 ] ~ [ i386 ] ~ [ arm ] ~ [ ppc ] ~ [ sparc64 ] ~

  1 
  2 relayfs - a high-speed data relay filesystem
  3 ============================================
  4 
  5 relayfs is a filesystem designed to provide an efficient mechanism for
  6 tools and facilities to relay large and potentially sustained streams
  7 of data from kernel space to user space.
  8 
  9 The main abstraction of relayfs is the 'channel'.  A channel consists
 10 of a set of per-cpu kernel buffers each represented by a file in the
 11 relayfs filesystem.  Kernel clients write into a channel using
 12 efficient write functions which automatically log to the current cpu's
 13 channel buffer.  User space applications mmap() the per-cpu files and
 14 retrieve the data as it becomes available.
 15 
 16 The format of the data logged into the channel buffers is completely
 17 up to the relayfs client; relayfs does however provide hooks which
 18 allow clients to impose some structure on the buffer data.  Nor does
 19 relayfs implement any form of data filtering - this also is left to
 20 the client.  The purpose is to keep relayfs as simple as possible.
 21 
 22 This document provides an overview of the relayfs API.  The details of
 23 the function parameters are documented along with the functions in the
 24 filesystem code - please see that for details.
 25 
 26 Semantics
 27 =========
 28 
 29 Each relayfs channel has one buffer per CPU, each buffer has one or
 30 more sub-buffers. Messages are written to the first sub-buffer until
 31 it is too full to contain a new message, in which case it it is
 32 written to the next (if available).  Messages are never split across
 33 sub-buffers.  At this point, userspace can be notified so it empties
 34 the first sub-buffer, while the kernel continues writing to the next.
 35 
 36 When notified that a sub-buffer is full, the kernel knows how many
 37 bytes of it are padding i.e. unused.  Userspace can use this knowledge
 38 to copy only valid data.
 39 
 40 After copying it, userspace can notify the kernel that a sub-buffer
 41 has been consumed.
 42 
 43 relayfs can operate in a mode where it will overwrite data not yet
 44 collected by userspace, and not wait for it to consume it.
 45 
 46 relayfs itself does not provide for communication of such data between
 47 userspace and kernel, allowing the kernel side to remain simple and
 48 not impose a single interface on userspace. It does provide a set of
 49 examples and a separate helper though, described below.
 50 
 51 klog and relay-apps example code
 52 ================================
 53 
 54 relayfs itself is ready to use, but to make things easier, a couple
 55 simple utility functions and a set of examples are provided.
 56 
 57 The relay-apps example tarball, available on the relayfs sourceforge
 58 site, contains a set of self-contained examples, each consisting of a
 59 pair of .c files containing boilerplate code for each of the user and
 60 kernel sides of a relayfs application; combined these two sets of
 61 boilerplate code provide glue to easily stream data to disk, without
 62 having to bother with mundane housekeeping chores.
 63 
 64 The 'klog debugging functions' patch (klog.patch in the relay-apps
 65 tarball) provides a couple of high-level logging functions to the
 66 kernel which allow writing formatted text or raw data to a channel,
 67 regardless of whether a channel to write into exists or not, or
 68 whether relayfs is compiled into the kernel or is configured as a
 69 module.  These functions allow you to put unconditional 'trace'
 70 statements anywhere in the kernel or kernel modules; only when there
 71 is a 'klog handler' registered will data actually be logged (see the
 72 klog and kleak examples for details).
 73 
 74 It is of course possible to use relayfs from scratch i.e. without
 75 using any of the relay-apps example code or klog, but you'll have to
 76 implement communication between userspace and kernel, allowing both to
 77 convey the state of buffers (full, empty, amount of padding).
 78 
 79 klog and the relay-apps examples can be found in the relay-apps
 80 tarball on http://relayfs.sourceforge.net
 81 
 82 
 83 The relayfs user space API
 84 ==========================
 85 
 86 relayfs implements basic file operations for user space access to
 87 relayfs channel buffer data.  Here are the file operations that are
 88 available and some comments regarding their behavior:
 89 
 90 open()   enables user to open an _existing_ buffer.
 91 
 92 mmap()   results in channel buffer being mapped into the caller's
 93          memory space. Note that you can't do a partial mmap - you must
 94          map the entire file, which is NRBUF * SUBBUFSIZE.
 95 
 96 read()   read the contents of a channel buffer.  The bytes read are
 97          'consumed' by the reader i.e. they won't be available again
 98          to subsequent reads.  If the channel is being used in
 99          no-overwrite mode (the default), it can be read at any time
100          even if there's an active kernel writer.  If the channel is
101          being used in overwrite mode and there are active channel
102          writers, results may be unpredictable - users should make
103          sure that all logging to the channel has ended before using
104          read() with overwrite mode.
105 
106 poll()   POLLIN/POLLRDNORM/POLLERR supported.  User applications are
107          notified when sub-buffer boundaries are crossed.
108 
109 close() decrements the channel buffer's refcount.  When the refcount
110         reaches 0 i.e. when no process or kernel client has the buffer
111         open, the channel buffer is freed.
112 
113 
114 In order for a user application to make use of relayfs files, the
115 relayfs filesystem must be mounted.  For example,
116 
117         mount -t relayfs relayfs /mnt/relay
118 
119 NOTE:   relayfs doesn't need to be mounted for kernel clients to create
120         or use channels - it only needs to be mounted when user space
121         applications need access to the buffer data.
122 
123 
124 The relayfs kernel API
125 ======================
126 
127 Here's a summary of the API relayfs provides to in-kernel clients:
128 
129 
130   channel management functions:
131 
132     relay_open(base_filename, parent, subbuf_size, n_subbufs,
133                callbacks)
134     relay_close(chan)
135     relay_flush(chan)
136     relay_reset(chan)
137     relayfs_create_dir(name, parent)
138     relayfs_remove_dir(dentry)
139     relayfs_create_file(name, parent, mode, fops, data)
140     relayfs_remove_file(dentry)
141 
142   channel management typically called on instigation of userspace:
143 
144     relay_subbufs_consumed(chan, cpu, subbufs_consumed)
145 
146   write functions:
147 
148     relay_write(chan, data, length)
149     __relay_write(chan, data, length)
150     relay_reserve(chan, length)
151 
152   callbacks:
153 
154     subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
155     buf_mapped(buf, filp)
156     buf_unmapped(buf, filp)
157     create_buf_file(filename, parent, mode, buf, is_global)
158     remove_buf_file(dentry)
159 
160   helper functions:
161 
162     relay_buf_full(buf)
163     subbuf_start_reserve(buf, length)
164 
165 
166 Creating a channel
167 ------------------
168 
169 relay_open() is used to create a channel, along with its per-cpu
170 channel buffers.  Each channel buffer will have an associated file
171 created for it in the relayfs filesystem, which can be opened and
172 mmapped from user space if desired.  The files are named
173 basename0...basenameN-1 where N is the number of online cpus, and by
174 default will be created in the root of the filesystem.  If you want a
175 directory structure to contain your relayfs files, you can create it
176 with relayfs_create_dir() and pass the parent directory to
177 relay_open().  Clients are responsible for cleaning up any directory
178 structure they create when the channel is closed - use
179 relayfs_remove_dir() for that.
180 
181 The total size of each per-cpu buffer is calculated by multiplying the
182 number of sub-buffers by the sub-buffer size passed into relay_open().
183 The idea behind sub-buffers is that they're basically an extension of
184 double-buffering to N buffers, and they also allow applications to
185 easily implement random-access-on-buffer-boundary schemes, which can
186 be important for some high-volume applications.  The number and size
187 of sub-buffers is completely dependent on the application and even for
188 the same application, different conditions will warrant different
189 values for these parameters at different times.  Typically, the right
190 values to use are best decided after some experimentation; in general,
191 though, it's safe to assume that having only 1 sub-buffer is a bad
192 idea - you're guaranteed to either overwrite data or lose events
193 depending on the channel mode being used.
194 
195 Channel 'modes'
196 ---------------
197 
198 relayfs channels can be used in either of two modes - 'overwrite' or
199 'no-overwrite'.  The mode is entirely determined by the implementation
200 of the subbuf_start() callback, as described below.  In 'overwrite'
201 mode, also known as 'flight recorder' mode, writes continuously cycle
202 around the buffer and will never fail, but will unconditionally
203 overwrite old data regardless of whether it's actually been consumed.
204 In no-overwrite mode, writes will fail i.e. data will be lost, if the
205 number of unconsumed sub-buffers equals the total number of
206 sub-buffers in the channel.  It should be clear that if there is no
207 consumer or if the consumer can't consume sub-buffers fast enought,
208 data will be lost in either case; the only difference is whether data
209 is lost from the beginning or the end of a buffer.
210 
211 As explained above, a relayfs channel is made of up one or more
212 per-cpu channel buffers, each implemented as a circular buffer
213 subdivided into one or more sub-buffers.  Messages are written into
214 the current sub-buffer of the channel's current per-cpu buffer via the
215 write functions described below.  Whenever a message can't fit into
216 the current sub-buffer, because there's no room left for it, the
217 client is notified via the subbuf_start() callback that a switch to a
218 new sub-buffer is about to occur.  The client uses this callback to 1)
219 initialize the next sub-buffer if appropriate 2) finalize the previous
220 sub-buffer if appropriate and 3) return a boolean value indicating
221 whether or not to actually go ahead with the sub-buffer switch.
222 
223 To implement 'no-overwrite' mode, the userspace client would provide
224 an implementation of the subbuf_start() callback something like the
225 following:
226 
227 static int subbuf_start(struct rchan_buf *buf,
228                         void *subbuf,
229                         void *prev_subbuf,
230                         unsigned int prev_padding)
231 {
232         if (prev_subbuf)
233                 *((unsigned *)prev_subbuf) = prev_padding;
234 
235         if (relay_buf_full(buf))
236                 return 0;
237 
238         subbuf_start_reserve(buf, sizeof(unsigned int));
239 
240         return 1;
241 }
242 
243 If the current buffer is full i.e. all sub-buffers remain unconsumed,
244 the callback returns 0 to indicate that the buffer switch should not
245 occur yet i.e. until the consumer has had a chance to read the current
246 set of ready sub-buffers.  For the relay_buf_full() function to make
247 sense, the consumer is reponsible for notifying relayfs when
248 sub-buffers have been consumed via relay_subbufs_consumed().  Any
249 subsequent attempts to write into the buffer will again invoke the
250 subbuf_start() callback with the same parameters; only when the
251 consumer has consumed one or more of the ready sub-buffers will
252 relay_buf_full() return 0, in which case the buffer switch can
253 continue.
254 
255 The implementation of the subbuf_start() callback for 'overwrite' mode
256 would be very similar:
257 
258 static int subbuf_start(struct rchan_buf *buf,
259                         void *subbuf,
260                         void *prev_subbuf,
261                         unsigned int prev_padding)
262 {
263         if (prev_subbuf)
264                 *((unsigned *)prev_subbuf) = prev_padding;
265 
266         subbuf_start_reserve(buf, sizeof(unsigned int));
267 
268         return 1;
269 }
270 
271 In this case, the relay_buf_full() check is meaningless and the
272 callback always returns 1, causing the buffer switch to occur
273 unconditionally.  It's also meaningless for the client to use the
274 relay_subbufs_consumed() function in this mode, as it's never
275 consulted.
276 
277 The default subbuf_start() implementation, used if the client doesn't
278 define any callbacks, or doesn't define the subbuf_start() callback,
279 implements the simplest possible 'no-overwrite' mode i.e. it does
280 nothing but return 0.
281 
282 Header information can be reserved at the beginning of each sub-buffer
283 by calling the subbuf_start_reserve() helper function from within the
284 subbuf_start() callback.  This reserved area can be used to store
285 whatever information the client wants.  In the example above, room is
286 reserved in each sub-buffer to store the padding count for that
287 sub-buffer.  This is filled in for the previous sub-buffer in the
288 subbuf_start() implementation; the padding value for the previous
289 sub-buffer is passed into the subbuf_start() callback along with a
290 pointer to the previous sub-buffer, since the padding value isn't
291 known until a sub-buffer is filled.  The subbuf_start() callback is
292 also called for the first sub-buffer when the channel is opened, to
293 give the client a chance to reserve space in it.  In this case the
294 previous sub-buffer pointer passed into the callback will be NULL, so
295 the client should check the value of the prev_subbuf pointer before
296 writing into the previous sub-buffer.
297 
298 Writing to a channel
299 --------------------
300 
301 kernel clients write data into the current cpu's channel buffer using
302 relay_write() or __relay_write().  relay_write() is the main logging
303 function - it uses local_irqsave() to protect the buffer and should be
304 used if you might be logging from interrupt context.  If you know
305 you'll never be logging from interrupt context, you can use
306 __relay_write(), which only disables preemption.  These functions
307 don't return a value, so you can't determine whether or not they
308 failed - the assumption is that you wouldn't want to check a return
309 value in the fast logging path anyway, and that they'll always succeed
310 unless the buffer is full and no-overwrite mode is being used, in
311 which case you can detect a failed write in the subbuf_start()
312 callback by calling the relay_buf_full() helper function.
313 
314 relay_reserve() is used to reserve a slot in a channel buffer which
315 can be written to later.  This would typically be used in applications
316 that need to write directly into a channel buffer without having to
317 stage data in a temporary buffer beforehand.  Because the actual write
318 may not happen immediately after the slot is reserved, applications
319 using relay_reserve() can keep a count of the number of bytes actually
320 written, either in space reserved in the sub-buffers themselves or as
321 a separate array.  See the 'reserve' example in the relay-apps tarball
322 at http://relayfs.sourceforge.net for an example of how this can be
323 done.  Because the write is under control of the client and is
324 separated from the reserve, relay_reserve() doesn't protect the buffer
325 at all - it's up to the client to provide the appropriate
326 synchronization when using relay_reserve().
327 
328 Closing a channel
329 -----------------
330 
331 The client calls relay_close() when it's finished using the channel.
332 The channel and its associated buffers are destroyed when there are no
333 longer any references to any of the channel buffers.  relay_flush()
334 forces a sub-buffer switch on all the channel buffers, and can be used
335 to finalize and process the last sub-buffers before the channel is
336 closed.
337 
338 Creating non-relay files
339 ------------------------
340 
341 relay_open() automatically creates files in the relayfs filesystem to
342 represent the per-cpu kernel buffers; it's often useful for
343 applications to be able to create their own files alongside the relay
344 files in the relayfs filesystem as well e.g. 'control' files much like
345 those created in /proc or debugfs for similar purposes, used to
346 communicate control information between the kernel and user sides of a
347 relayfs application.  For this purpose the relayfs_create_file() and
348 relayfs_remove_file() API functions exist.  For relayfs_create_file(),
349 the caller passes in a set of user-defined file operations to be used
350 for the file and an optional void * to a user-specified data item,
351 which will be accessible via inode->u.generic_ip (see the relay-apps
352 tarball for examples).  The file_operations are a required parameter
353 to relayfs_create_file() and thus the semantics of these files are
354 completely defined by the caller.
355 
356 See the relay-apps tarball at http://relayfs.sourceforge.net for
357 examples of how these non-relay files are meant to be used.
358 
359 Creating relay files in other filesystems
360 -----------------------------------------
361 
362 By default of course, relay_open() creates relay files in the relayfs
363 filesystem.  Because relay_file_operations is exported, however, it's
364 also possible to create and use relay files in other pseudo-filesytems
365 such as debugfs.
366 
367 For this purpose, two callback functions are provided,
368 create_buf_file() and remove_buf_file().  create_buf_file() is called
369 once for each per-cpu buffer from relay_open() to allow the client to
370 create a file to be used to represent the corresponding buffer; if
371 this callback is not defined, the default implementation will create
372 and return a file in the relayfs filesystem to represent the buffer.
373 The callback should return the dentry of the file created to represent
374 the relay buffer.  Note that the parent directory passed to
375 relay_open() (and passed along to the callback), if specified, must
376 exist in the same filesystem the new relay file is created in.  If
377 create_buf_file() is defined, remove_buf_file() must also be defined;
378 it's responsible for deleting the file(s) created in create_buf_file()
379 and is called during relay_close().
380 
381 The create_buf_file() implementation can also be defined in such a way
382 as to allow the creation of a single 'global' buffer instead of the
383 default per-cpu set.  This can be useful for applications interested
384 mainly in seeing the relative ordering of system-wide events without
385 the need to bother with saving explicit timestamps for the purpose of
386 merging/sorting per-cpu files in a postprocessing step.
387 
388 To have relay_open() create a global buffer, the create_buf_file()
389 implementation should set the value of the is_global outparam to a
390 non-zero value in addition to creating the file that will be used to
391 represent the single buffer.  In the case of a global buffer,
392 create_buf_file() and remove_buf_file() will be called only once.  The
393 normal channel-writing functions e.g. relay_write() can still be used
394 - writes from any cpu will transparently end up in the global buffer -
395 but since it is a global buffer, callers should make sure they use the
396 proper locking for such a buffer, either by wrapping writes in a
397 spinlock, or by copying a write function from relayfs_fs.h and
398 creating a local version that internally does the proper locking.
399 
400 See the 'exported-relayfile' examples in the relay-apps tarball for
401 examples of creating and using relay files in debugfs.
402 
403 Misc
404 ----
405 
406 Some applications may want to keep a channel around and re-use it
407 rather than open and close a new channel for each use.  relay_reset()
408 can be used for this purpose - it resets a channel to its initial
409 state without reallocating channel buffer memory or destroying
410 existing mappings.  It should however only be called when it's safe to
411 do so i.e. when the channel isn't currently being written to.
412 
413 Finally, there are a couple of utility callbacks that can be used for
414 different purposes.  buf_mapped() is called whenever a channel buffer
415 is mmapped from user space and buf_unmapped() is called when it's
416 unmapped.  The client can use this notification to trigger actions
417 within the kernel application, such as enabling/disabling logging to
418 the channel.
419 
420 
421 Resources
422 =========
423 
424 For news, example code, mailing list, etc. see the relayfs homepage:
425 
426     http://relayfs.sourceforge.net
427 
428 
429 Credits
430 =======
431 
432 The ideas and specs for relayfs came about as a result of discussions
433 on tracing involving the following:
434 
435 Michel Dagenais         <michel.dagenais@polymtl.ca>
436 Richard Moore           <richardj_moore@uk.ibm.com>
437 Bob Wisniewski          <bob@watson.ibm.com>
438 Karim Yaghmour          <karim@opersys.com>
439 Tom Zanussi             <zanussi@us.ibm.com>
440 
441 Also thanks to Hubertus Franke for a lot of useful suggestions and bug
442 reports.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~ [ freetext search ] ~ [ file search ] ~

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.