[wellylug] hardware errors

Martin Ehrenstein martin.ehrenstein at gmail.com
Wed Jun 11 20:45:43 NZST 2014


Hi Richard

On the one hand I wouldn't necessarily blame memtest as first I'm sure it's
clever enough to deal with ECC and in my experience* applications are much
better at detecting memory faults than memtest. This might be either
because you spend much more time running apps than running memtest; also in
the old days you could listen to your computers hum (by turning up the
volume on the TV I was using as a cheap CRT) and tell if it had gone into
an infinite loop, by the fact that that made a very regular buzz, as
opposed to normal operation. Electrically a CPU sustains a barely
controlled mess of noise and under normal operation (apps) and it exercises
more failure modes than memtest which is quite but not entirely like an
infinite loop.

On the other hand I had an oddity with memtest apparently crashing and a
few days later I couldn't play a DVD finding that my region had changed. As
too many region changes cause definite problems, I subsequently stopped
running memtest on a regular basis as a preventive measure.

Nowadays I might be tempted to replace DIMMs first and worry about running
memtest later, but it also depends what kind of box you got.

*that includes a whopping 2 cases of faulty memory, I guess I'm mostly a
software guy

Martin Ehrenstein <http://about.me/martin.e>



On 11 June 2014 18:51, Richard Hector <richard at walnut.gen.nz> wrote:

> Hi all,
>
> I've got a (client's) machine that reports hardware errors, probably
> relating to memory. I'm running memtest86+ (latest version) on it; that
> hasn't shown anything so far. I suspect that may be because ECC corrects
> the errors before memtest86+ sees it, while Linux receives an exception or
> something and that's what it's logging.
>
> Does anyone know of better tools to diagnose this - perhaps ones that are
> likely to always hit the error, or ones that can get around ECC hiding the
> problem, or (best) ones that will identify a specific flakey DIMM (or
> chipset or cpu or whatever is causing the problem)?
>
> A sample of the errors is shown here:
> http://paste.debian.net/104039/
>
> Any hints very welcome :-)
>
> Thanks,
> Richard
>
>
> --
> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
> To Leave:  http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wellylug.org.nz/pipermail/wellylug/attachments/20140611/b2b4fb84/attachment.html>


More information about the wellylug mailing list