Luck, Tony wrote: > >>> ld4.acq r28 = [r29] // xtime_lock.sequence. Must come first for locking purposes > >>> + ;; > >>> (p8) mov r2 = ar.itc // CPU_TIMER. 36 clocks latency!!! > > The .acq only causes ordering w.r.t. data accesses. The read from ar.itc > isn't a data access, so potentially it could still float before the > ld4.acq. Consuming the value loaded into r28 presumably has to > ensure that the load completes though. Hmm, then will this problem not happen if timesource was not ar.itc? If source is mmio, the read from the address is a data access, isn't it? > I'm guessing here ... I haven't cross-checked with the architects. I'll be grad if we can get a comment from Intel's architects. > Does moving the "and r28 = ~1,r28" up into this slot hurt latency > for a single call to gettimeofday()? Presumably it will if > xtime_lock.sequence is not in the cache. > > -Tony It will, I guess. Anyway, we should make sure that the load of xtime_lock.sequence have complete before reading ar.itc. Thanks, H.Seto - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Wed Jul 11 11:29:06 2007
This archive was generated by hypermail 2.1.8 : 2007-07-11 11:29:22 EST