[wellylug] hardware errors
Daniel Reurich
daniel at centurion.net.nz
Wed Jun 11 20:29:43 NZST 2014
Hi Richard,
I don't believe it's a dimm issue (or it's not handled by ECC) as my
experience is that ECC reported errors are much noisier and clearly
marked as ECC.
By the looks of it the issue is either L3 cache or northbridge memory
bus issue. It could also be a kernel bug. Have you recently updated
the kernel? If memtest doesn't pick it up as a ram issue then it seems
that the cpu is implicated. You could try installing cpuburn and run
some load tests with that.
What are the hardware specs of this machine?
Cheers,
Daniel
On 11/06/14 18:51, Richard Hector wrote:
> Hi all,
>
> I've got a (client's) machine that reports hardware errors, probably
> relating to memory. I'm running memtest86+ (latest version) on it; that
> hasn't shown anything so far. I suspect that may be because ECC corrects
> the errors before memtest86+ sees it, while Linux receives an exception
> or something and that's what it's logging.
>
> Does anyone know of better tools to diagnose this - perhaps ones that
> are likely to always hit the error, or ones that can get around ECC
> hiding the problem, or (best) ones that will identify a specific flakey
> DIMM (or chipset or cpu or whatever is causing the problem)?
>
> A sample of the errors is shown here:
> http://paste.debian.net/104039/
>
> Any hints very welcome :-)
>
> Thanks,
> Richard
>
>
--
Daniel Reurich
Centurion Computer Technology (2005) Ltd.
021 797 722
More information about the wellylug
mailing list