[wellylug] High load averages but no apparent cause

Daniel Reurich daniel at centurion.net.nz
Wed Mar 24 15:44:54 NZDT 2010


Hi David,

Just a hunch: I wonder if your harddrives are going into standby.

It's normally a bios setting, but can be overridden by issuing hdparm
-S0 -K for each drive.  (I suggest you read the manual for hdparm first,
but -S sets the standby timeout, and -K makes it persistant across
reboot & powercycle).

Also make sure you haven't got laptop mode or any other "powersaving"
utilities installed or configured.

Regards,
	Daniel.


On Wed, 2010-03-24 at 13:56 +1300, David Harrison wrote:
> Hi,
> Thanks for taking an interest.
> 
> 
> Attached are the "smartctl -a" outputs for the three drives.
> 
> 
> If you spot anything let me know.
> In the meantime we are arranging to have a server shipped up from
> Wellington to replace this one.
> 
> 
> 
> 
> David
> 
> 
> 
> 
> On Wed, Mar 24, 2010 at 12:47 PM, Daniel Reurich
> <daniel at centurion.net.nz> wrote:
>         Can you post the full output of smartctl -a for each drive
>         (offlist
>         maybe?).
>         
>         Daniel
>         
>         
>         
>         On Wed, 2010-03-24 at 11:54 +1300, David Harrison wrote:
>         > Just a follow up, this thread describes my problem exactly:
>         >
>         http://centos.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=22554&forum=37
>         >
>         >
>         > The short story is that even though smartctl reports no
>         issue there is
>         > probably a hardware issue.
>         >
>         >
>         >
>         >
>         > Below is the output from iostat on the server when the disk
>         problem is
>         > taking place.
>         > The utilisation of sda and sdb are 100% even though they are
>         hardly
>         > doing anything.
>         > This lockup remains for 15-20 seconds until something in the
>         > kernel/hardware resets, and then it is happy again.
>         >
>         >
>         > Output of iostat while problem is taking place:
>         >
>         >
>         > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>         >            0.38    0.00    0.00   99.62    0.00    0.00
>         >
>         >
>         > Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s
>         wsec/s
>         > avgrq-sz avgqu-sz   await  svctm  %util
>         > sda               0.00     1.75    0.25    0.50     2.00
>          14.00
>         >  21.33    10.73 4300.00 1333.33 100.00
>         > sdb               0.00     1.75    0.50    0.25     4.00
>          28.00
>         >  42.67    10.10 3340.00 1333.33 100.00
>         > sdc               1.50     0.00    0.25    0.50    14.00
>         4.00
>         >  24.00     0.01   10.00  10.00   0.75
>         > md0               0.00     0.00    0.00    0.00     0.00
>         0.00
>         > 0.00     0.00    0.00   0.00   0.00
>         > md1               0.00     0.00    0.00    0.00     0.00
>         0.00
>         > 0.00     0.00    0.00   0.00   0.00
>         > md2               0.00     0.00    0.00    0.00     0.00
>         0.00
>         > 0.00     0.00    0.00   0.00   0.00
>         > md3               0.00     0.00    0.25    2.75     2.00
>          22.00
>         > 8.00     0.00    0.00   0.00   0.00
>         >
>         >
>         >
>         >
>         > Output of iostat when problem is not taking place:
>         >
>         >
>         > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>         >            0.62    0.00    4.62    6.50    0.00   88.25
>         >
>         >
>         > Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s
>         wsec/s
>         > avgrq-sz avgqu-sz   await  svctm  %util
>         > sda               4.50    14.25    3.75   13.00    66.00
>         234.00
>         >  17.91     0.08    4.93   4.63   7.75
>         > sdb               6.50    12.25    5.25   12.00    94.00
>         210.00
>         >  17.62     0.12    7.25   6.09  10.50
>         > sdc               7.25    11.00    4.75   12.25    96.00
>         202.00
>         >  17.53     0.10    5.88   4.85   8.25
>         > md0               0.00     0.00    0.00    0.00     0.00
>         0.00
>         > 0.00     0.00    0.00   0.00   0.00
>         > md1               0.00     0.00    0.00    0.00     0.00
>         0.00
>         > 0.00     0.00    0.00   0.00   0.00
>         > md2               0.00     0.00    0.00    1.50     0.00
>          12.00
>         > 8.00     0.00    0.00   0.00   0.00
>         > md3               0.00     0.00    1.75   33.00    14.00
>         264.00
>         > 8.00     0.00    0.00   0.00   0.00
>         >
>         >
>         >
>         >
>         > This suggests that either there's a problem with sda and
>         sdb, or
>         > there's an issue with the SATA controller which is leaving
>         both
>         > hanging. Annoying either way...
>         >
>         >
>         >
>         > David
>         
>         
>         
>         --
>         Daniel Reurich.
>         
>         Centurion Computer Technology (2005) Ltd
>         Mobile 021 797 722
>         
>         
>         
>         
>         --
>         
>         
>         Wellington Linux Users Group Mailing List:
>         wellylug at lists.wellylug.org.nz
>         To Leave:
>          http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>         
> 
> 





More information about the wellylug mailing list