RE: [RFC] Better MCA recovery on IPF

From: Alberto Munoz <amunoz_at_vmware.com>
Date: 2003-11-04 10:09:07
Because I was really curious as to how much this field may have changed since
the last time I checked, I read fairly quickly through the paper you mention
below.

As stated in section 5, second paragraph, of the document you reference
below, poisoning does not apply to reads (except for delivering an MCA at any
read attempt of the poisoned data). The main value for poisoning is to avoid
delivering a machine check "out of context" when it would be caused by a
write operation. The problem is that an execution context (or thread, or
process) is allowed to retire write operations BEFORE the data has actually
been safely stored in memory. For example, you can complete a write operation
to the cache, and then have an error occur when the data is written from the
cache to main memory. Unfortunately, when this error occurs, chances are that
the original context that generated the write may no longer be executing. It
is also possible that the written data will never be used again, in which
case generating an MCA would be wasteful. Instead of generating an MCA, the
hardware marks the data as poisoned (in an implementation specific way that
allows the data to move through the memory hierarchy without generating
MCAs).

I still believe that a failed speculative read (for example of poisoned data)
will generate an MCA. Perhaps someone from Intel can confirm or deny?

Bert Munoz

> -----Original Message-----
> From: Jack Steiner [mailto:steiner@sgi.com]
> Sent: Monday, November 03, 2003 11:29 AM
> To: Alberto Munoz
> Cc: Matthias Fouquet-Lapar; Russ Anderson; linux-ia64@vger.kernel.org
> Subject: Re: [RFC] Better MCA recovery on IPF
> 
> 
> On Mon, Nov 03, 2003 at 10:42:48AM -0800, Alberto Munoz wrote:
> > 
> > 
> > > > Hi,
> > > > 
> > > > I just wondered if a speculative load hitting a cache or memory
> > > > error does cause an exception on IA64 ? 
> > > 
> > > I dont think a speculative load should cause a problem - at 
> > > least until 
> > > code tries to consume the data by transfering it to a 
> > > processor register.
> > 
> > If you are doing a read (which is what a speculative load will be
> > generating), the error will be generated by whatever part 
> of the logic that
> > detects it. You cannot possible send poisoned data through 
> a memory bus and a
> > system bus (at least not the Intel system buses I am 
> familiar with) without
> > having some of the error checking logic (ECC or parity) 
> complaining about it
> > (this means generating an MCA).
> 
> 
> As the poisoned data flows thru the BUSes, errors may be 
> reported but these errors
> are not reported to the OS as uncorrected/fatal MCA errors. 
> Depending on your
> chipset, errors are logged as platform errors. 
> 
> 
> There is a good paper by Tony Luck (Intel) that describes 
> data poisoning as used
> in IA64. You can find it on google or at:
> 
> 	archive.linuxsymposium.org/ols2003/Proceedings/ 
> All-Reprints/Reprint-Luck-OLS2003.pdf 
> 
> See the section on "data poisoning".
> 
> > 
> > > As I understand the cpu architecture, an error that occurs 
> > > reading data
> > > will result in a poisoned cache line being delivered to the 
> > > cpu cache. 
> > > The poisoned cache line can stay in the cache forever. No 
> MCA error is
> > > reported until the data is actually consumed by tranfering 
> > > the data from 
> > > cache to a cpu register. 
> > 
> > The problem is that the cache error checking logic has no 
> way of knowing that
> > the data it is about to supply to some register is going to 
> be used for a
> > speculative operation. The cache logic is pretty far away 
> (in processor
> > terms) from the decoding logic.
> > 
> > Bert Munoz
> > 
> > > This requires some support from the chipset. Some 
> chipsets dont fully
> > > support this error model.
> > > 
> > > 
> > > 
> > > 
> 
> -- 
> Thanks
> 
> Jack Steiner (steiner@sgi.com)          651-683-5302
> Principal Engineer                      SGI - Silicon Graphics, Inc.
> 
> 
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Mon Nov 3 18:09:40 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:20 EST