[wellylug] hardware errors

Daniel Reurich daniel at centurion.net.nz
Wed Jun 11 20:29:43 NZST 2014


Hi Richard,

I don't believe it's a dimm issue (or it's not handled by ECC) as my 
experience is that ECC reported errors are much noisier and clearly 
marked as ECC.

By the looks of it the issue is either L3 cache or northbridge memory 
bus issue.  It could also be a kernel bug.  Have you recently updated 
the kernel?  If memtest doesn't pick it up as a ram issue then it seems 
that the cpu is implicated.  You could try installing cpuburn and run 
some load tests with that.

What are the hardware specs of this machine?

Cheers,
	Daniel

On 11/06/14 18:51, Richard Hector wrote:
> Hi all,
>
> I've got a (client's) machine that reports hardware errors, probably
> relating to memory. I'm running memtest86+ (latest version) on it; that
> hasn't shown anything so far. I suspect that may be because ECC corrects
> the errors before memtest86+ sees it, while Linux receives an exception
> or something and that's what it's logging.
>
> Does anyone know of better tools to diagnose this - perhaps ones that
> are likely to always hit the error, or ones that can get around ECC
> hiding the problem, or (best) ones that will identify a specific flakey
> DIMM (or chipset or cpu or whatever is causing the problem)?
>
> A sample of the errors is shown here:
> http://paste.debian.net/104039/
>
> Any hints very welcome :-)
>
> Thanks,
> Richard
>
>


-- 
Daniel Reurich
Centurion Computer Technology (2005) Ltd.
021 797 722



More information about the wellylug mailing list