On Thu, 25 Mar 2004 11:41:53 -0800, David Mosberger <davidm@napali.hpl.hp.com> wrote: >Has anyone studied the impact of doing exponential backoff in >ia64_spinlock_contention. My theory is that it wouldn't buy much _if_ >spinlocks always were in their own cachelines, but since they're not, >not using backoff could cause extra cache-line bouncing. To be >honest, I'd rather not spend time on this myself, since I don't have >convenient access to large machines, but me thinks this is a question >that's long overdue to have a proper answer. Short answer, on one workload it gave ~2.5% improvement on a highly contended lock, but we got much better results by changing the lock type. Somebody had a workload that would result in most cpus spinning on the same inode lock. From our bug tracking system :- "The app is threaded. The threads are synchronized using barriers or something equivalent. After each barrier, each thread opens & reads the next file in a series of files. At any point in time, all threads are reading the same file". 8: 3.99 wall, 28.76 sys, 6.97 user 16: 8.82 wall, 124.99 sys, 24.64 user 32: 23.26 wall, 655.31 sys, 109.93 user 64: 71.78 wall, 3681.90 sys, 858.37 user This was on a 2.4 based kernel with my patch for out of line spinlock contention, using brl, not call. Changing the contention path to do backoff[*] gave approximately a 2.5% improvement in wall time. 8: 3.74 wall, 27.04 sys, 6.51 user 16: 8.41 wall, 118.92 sys, 23.50 user 32: 22.50 wall, 628.80 sys, 111.40 user 64: 69.84 wall, 3491.79 sys, 967.31 user Jack Steiner added this comment to the bug :- "I have seen a number of attempts to use backoff algorithms in kernel code. Most have failed because locks are usually lightly contended and backoff is not needed. It unnecessarily delays the next process that tries to get the lock. For example, if one process holds a lock & ONLY 1 other process is trying to acquire it, backoff is not needed as long as the lock is not shared with other locks or data. There are many cases to consider & in general, code should be optimized for the lightly contended case". Our kernel does not use backoff for spinlocks, the above was just a test. We decided that it was better to redesign the lock. All places where the inode lock was being taken could sleep, so we changed from a spin lock to rwsem. That resulted in the contending threads being rescheduled instead of spinning on the same cache line which gave much better cache traffic and resulted in an overall reduction in wall and system time for this highly contended lock. [*] Extract of patched 2.4 out of line contention code. The exponential backoff code came from early 2.4 kernels. It starts at now&0xf, doubles each time, clips to the low 14 bits then or with 0xf to avoid going to 0. mov now=ar.itc ... and delay=0x3f,now ... .retry: // exponential backoff, kdb, lockmeter etc. go in here // add timeout=now,delay shl delay=delay,1 ;; // FIXME: limit should depend on number of cpus dep delay=delay,r0,0,13 // limit delay to 8192 cycles ;; // delay a little... .wait: sub now=now,timeout or delay=0xf,delay // make sure delay is non-zero (otherwise we get stuck with 0) ;; cmp.lt p15,p0=now,r0 mov now=ar.itc (p15) br.cond.sptk .wait ;; ld4 r21=[r31] ;; cmp4.ne p15,p0=r21,r0 (p15) br.cond.sptk.few .retry - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Thu Mar 25 17:28:13 2004
This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:24 EST