Re: should ia64_spinlock_contention do backoff?

From: Keith Owens <>
Date: 2004-03-26 09:13:14
On Thu, 25 Mar 2004 11:41:53 -0800, 
David Mosberger <> wrote:
>Has anyone studied the impact of doing exponential backoff in
>ia64_spinlock_contention.  My theory is that it wouldn't buy much _if_
>spinlocks always were in their own cachelines, but since they're not,
>not using backoff could cause extra cache-line bouncing.  To be
>honest, I'd rather not spend time on this myself, since I don't have
>convenient access to large machines, but me thinks this is a question
>that's long overdue to have a proper answer.

Short answer, on one workload it gave ~2.5% improvement on a highly
contended lock, but we got much better results by changing the lock

Somebody had a workload that would result in most cpus spinning on the
same inode lock.  From our bug tracking system :- "The app is threaded.
The threads are synchronized using barriers or something equivalent.
After each barrier, each thread opens & reads the next file in a series
of files. At any point in time, all threads are reading the same file".

8:  3.99 wall,    28.76 sys,   6.97 user
16:  8.82 wall,    124.99 sys,   24.64 user
32:  23.26 wall,    655.31 sys,   109.93 user
64:  71.78 wall,    3681.90 sys,   858.37 user

This was on a 2.4 based kernel with my patch for out of line spinlock
contention, using brl, not call.  Changing the contention path to do
backoff[*] gave approximately a 2.5% improvement in wall time.

8:  3.74 wall,    27.04 sys,   6.51 user
16:  8.41 wall,    118.92 sys,   23.50 user
32:  22.50 wall,    628.80 sys,   111.40 user
64:  69.84 wall,    3491.79 sys,   967.31 user

Jack Steiner added this comment to the bug :- "I have seen a number of
attempts to use backoff algorithms in kernel code.  Most have failed
because locks are usually lightly contended and backoff is not needed.
It unnecessarily delays the next process that tries to get the lock.

For example, if one process holds a lock & ONLY 1 other process is
trying to acquire it, backoff is not needed as long as the lock is not
shared with other locks or data. There are many cases to consider & in
general, code should be optimized for the lightly contended case".

Our kernel does not use backoff for spinlocks, the above was just a
test.  We decided that it was better to redesign the lock.  All places
where the inode lock was being taken could sleep, so we changed from a
spin lock to rwsem.  That resulted in the contending threads being
rescheduled instead of spinning on the same cache line which gave much
better cache traffic and resulted in an overall reduction in wall and
system time for this highly contended lock.

[*] Extract of patched 2.4 out of line contention code.  The
exponential backoff code came from early 2.4 kernels.  It starts at
now&0xf, doubles each time, clips to the low 14 bits then or with 0xf
to avoid going to 0.

        mov now=ar.itc
	and delay=0x3f,now
        // exponential backoff, kdb, lockmeter etc. go in here

        add timeout=now,delay
        shl delay=delay,1
        // FIXME: limit should depend on number of cpus
        dep delay=delay,r0,0,13                 // limit delay to 8192 cycles
        // delay a little...
.wait:  sub now=now,timeout
        or delay=0xf,delay                      // make sure delay is non-zero (otherwise we get stuck with 0)
        ;; p15,p0=now,r0
        mov now=ar.itc
(p15)   br.cond.sptk .wait

        ld4 r21=[r31]
        ;; p15,p0=r21,r0
(p15)   br.cond.sptk.few .retry

To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Thu Mar 25 17:28:13 2004

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:24 EST