> > I can estimate what the procedure includes, such as changing > > poisoned memory to uncacheable, clearing suspect data in cache, and storing > > zeros to the poisoned area. > > There is no way to tell if the error is soft/transient > and can be cleared by that sequence, or hard/permanent. I think there is. Depending on your chipset you can re-read the memory uncached after all outstanding references have terminated. If you don't get the same error, it is transient. Since I would expect that the majority of errors to be transient, I think this really is the right approach. Again, depending on the chipset architecture you might want to do some uncached write/reads ("micro-diagnostics") to see if the problem can be identified to confirm the nature of the problem. I used similar approaches on other architectures when figuring out if a Single Bit was transient or hard. The goal was to stop triggering for SBEs once you know that you have a hard SBE due to the large overhead > The safest option is to simply take the page with > the error out of service and not re-use it. One problem might be that you now miss a page of main memory and it might require an additional TLB entry if you use large memory segments - Matthias > > -Tony > - > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Sat Nov 8 02:39:09 2003
This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:20 EST