<div dir="ltr">Hi Richard<div><br></div><div>On the one hand I wouldn't necessarily blame <span class="">memtest</span> as first I'm sure it's clever enough to deal with ECC and in my experience* applications are much better at detecting memory faults than <span class="">memtest</span>. This might be either because you spend much more time running apps than running <span class="">memtest</span>; also in the old days you could listen to your computers hum (by turning up the volume on the TV I was using as a cheap CRT) and tell if it had gone into an infinite loop, by the fact that that made a very regular buzz, as opposed to normal operation. Electrically a CPU sustains a barely controlled mess of noise and under normal operation (apps) and it exercises more failure modes than <span class="">memtest</span> which is quite but not entirely like an infinite loop.</div>
<div><br></div><div>On the other hand I had an oddity with <span class="">memtest</span> apparently crashing and a few days later I couldn't play a DVD finding that my region had changed. As too many region changes cause definite problems, I subsequently stopped running <span class="">memtest</span> on a regular basis as a preventive measure.</div>
<div><br></div><div>Nowadays I might be tempted to replace <span class="">DIMMs</span> first and worry about running <span class="">memtest</span> later, but it also depends what kind of box you got.</div><div><br></div><div>
*that includes a whopping 2 cases of faulty memory, I guess I'm mostly a software guy<br></div><div class="gmail_extra"><br clear="all"><div><a href="http://about.me/martin.e" target="_blank">Martin Ehrenstein</a><div>
<br></div></div>
<br><br><div class="gmail_quote">On 11 June 2014 18:51, Richard Hector <span dir="ltr"><<a href="mailto:richard@walnut.gen.nz" target="_blank">richard@walnut.gen.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi all,<br>
<br>
I've got a (client's) machine that reports hardware errors, probably relating to memory. I'm running memtest86+ (latest version) on it; that hasn't shown anything so far. I suspect that may be because ECC corrects the errors before memtest86+ sees it, while Linux receives an exception or something and that's what it's logging.<br>
<br>
Does anyone know of better tools to diagnose this - perhaps ones that are likely to always hit the error, or ones that can get around ECC hiding the problem, or (best) ones that will identify a specific flakey DIMM (or chipset or cpu or whatever is causing the problem)?<br>
<br>
A sample of the errors is shown here:<br>
<a href="http://paste.debian.net/104039/" target="_blank">http://paste.debian.net/<u></u>104039/</a><br>
<br>
Any hints very welcome :-)<br>
<br>
Thanks,<br>
Richard<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
-- <br>
Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz" target="_blank">wellylug@lists.wellylug.org.nz</a><br>
To Leave: <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/<u></u>mailman/listinfo/wellylug</a><br>
</font></span></blockquote></div><br></div></div>