Re: [RFC] Better MCA recovery on IPF

From: Jack Steiner <steiner_at_sgi.com>
Date: 2003-11-04 06:28:56
On Mon, Nov 03, 2003 at 10:42:48AM -0800, Alberto Munoz wrote:
> 
> 
> > > Hi,
> > > 
> > > I just wondered if a speculative load hitting a cache or memory
> > > error does cause an exception on IA64 ? 
> > 
> > I dont think a speculative load should cause a problem - at 
> > least until 
> > code tries to consume the data by transfering it to a 
> > processor register.
> 
> If you are doing a read (which is what a speculative load will be
> generating), the error will be generated by whatever part of the logic that
> detects it. You cannot possible send poisoned data through a memory bus and a
> system bus (at least not the Intel system buses I am familiar with) without
> having some of the error checking logic (ECC or parity) complaining about it
> (this means generating an MCA).


As the poisoned data flows thru the BUSes, errors may be reported but these errors
are not reported to the OS as uncorrected/fatal MCA errors. Depending on your
chipset, errors are logged as platform errors. 


There is a good paper by Tony Luck (Intel) that describes data poisoning as used
in IA64. You can find it on google or at:

	archive.linuxsymposium.org/ols2003/Proceedings/ All-Reprints/Reprint-Luck-OLS2003.pdf 

See the section on "data poisoning".

> 
> > As I understand the cpu architecture, an error that occurs 
> > reading data
> > will result in a poisoned cache line being delivered to the 
> > cpu cache. 
> > The poisoned cache line can stay in the cache forever. No MCA error is
> > reported until the data is actually consumed by tranfering 
> > the data from 
> > cache to a cpu register. 
> 
> The problem is that the cache error checking logic has no way of knowing that
> the data it is about to supply to some register is going to be used for a
> speculative operation. The cache logic is pretty far away (in processor
> terms) from the decoding logic.
> 
> Bert Munoz
> 
> > This requires some support from the chipset. Some chipsets dont fully
> > support this error model.
> > 
> > 
> > 
> > 

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Mon Nov 3 14:37:51 2003

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:20 EST