[Linux-ia64] Re: strange performance behaviour with floats

From: Keith Owens <kaos_at_sgi.com>
Date: 2003-02-24 12:45:10
On Fri, 21 Feb 2003 18:30:32 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>1-cycle loops are
>never optimal on McKinley (that's why the Linux bogomips comes out at
>1438 instead of 2000, for example), though I don't know the exact
>micro-architectural details that cause this.

--- include/asm-ia64/delay.h
+++ include/asm-ia64/delay.h
@@ -71,7 +71,7 @@
 
        __asm__ __volatile__("mov %0=ar.lc;;" : "=r"(saved_ar_lc));
        __asm__ __volatile__("mov ar.lc=%0;;" :: "r"(loops - 1));
-        __asm__ __volatile__("1:\tbr.cloop.sptk.few 1b;;");
+        __asm__ __volatile__("1:\tnop 0;nop 0;nop 0;br.cloop.sptk.few 1b;;");
        __asm__ __volatile__("mov ar.lc=%0" :: "r"(saved_ar_lc));
 }

generated a two bundle loop as you suggested, but BogoMIPS went down,
not up.

Original code, one bundle br.cloop:

  CPU 0: base freq=200.000MHz, ITC ratio=9/2, ITC freq=900.000MHz
  Calibrating delay loop... 1347.52 BogoMIPS

Modified code, two bundle br.cloop:

  CPU 0: base freq=200.000MHz, ITC ratio=9/2, ITC freq=900.000MHz
  Calibrating delay loop... 898.68 BogoMIPS

processor  : 0
vendor     : GenuineIntel
arch       : IA-64
family     : Itanium 2
model      : 0
revision   : 6
archrev    : 0
features   : branchlong
cpu number : 0
cpu regs   : 4
cpu MHz    : 900.000000
itc MHz    : 900.000000
Received on Sun Feb 23 17:45:27 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:12 EST