[wellylug] hardware errors
Ewen McNeill
wellylug at ewen.mcneill.gen.nz
Wed Jun 11 22:40:25 NZST 2014
On 11/06/14 20:35, Richard Hector wrote:
> It's a Sun Fire X2100 M2 - dual core opteron 1218, 2600MHz with 4G of RAM
According to:
http://docs.oracle.com/cd/E19121-01/sf.x2100m2/819-6591-11/Chap4.html#17924
http://docs.oracle.com/cd/E19121-01/sf.x2100m2/819-6591-11/Chap1.html
it takes unbuffered DDR2 RAM, in pairs, and needs to be populated from
slot 0 to slot 3 -- supporting 0.5/1/2 GB DIMMs. Some memory suppliers
seem to think faster DDR2 (eg, DDR2-667 --
http://www.memoryxsun.com/mtx5278aa.html) will work as well as DDR2-400,
but it does need to be ECC Unbuffered.
If the RAM was triggering ECC issues there's a reasonable chance
memtest86+ would show it -- it does hook into the reporting mechanism of
several servers and should, eg, catch the NMI reports at least. If not
there, they should also show up in the server out of band management
logs if there are RAM ECC issues. (And the out of band management
should be able to identify the specific DIMM with issues, since it knows
the memory layout to physical DIMM mapping.)
However given the reports, and age of the hardware (IIRC the Sun Fire
X2100 M2 was sold from around 8 years ago to around 5 years ago), I'd
also be wondering about mechanical causes too. For instance, a CPU that
is running a bit hot due to a fan that's Rather Old (and, eg, dusty)
might periodically "miss a bit" when running more than usual. Either
checking the out of band management (IIRC that'll display fan speeds) or
a physical inspection might be warranted. (It could be back of case
fans or mid-case fans, CPU fan or even a power supply fan causing the
rails to dip a bit under load.) If it gets worse when, eg, something is
keeping the CPU busy that'd definitely be my pick.
I'd be inclined to check for fans/heat issues before doing anything more
with the RAM, given the age.
It may be that the upgrade to Wheezy simply provided a mechanism to see
an issue that's been happening for a while, but hasn't been logged
outside the out of band management until now.
Ewen
More information about the wellylug
mailing list