[wellylug] High load averages but no apparent cause

Daniel Reurich daniel at centurion.net.nz
Wed Mar 24 12:47:04 NZDT 2010


Can you post the full output of smartctl -a for each drive (offlist
maybe?).

Daniel


On Wed, 2010-03-24 at 11:54 +1300, David Harrison wrote:
> Just a follow up, this thread describes my problem exactly:
> http://centos.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=22554&forum=37
> 
> 
> The short story is that even though smartctl reports no issue there is
> probably a hardware issue.
> 
> 
> 
> 
> Below is the output from iostat on the server when the disk problem is
> taking place.
> The utilisation of sda and sdb are 100% even though they are hardly
> doing anything.
> This lockup remains for 15-20 seconds until something in the
> kernel/hardware resets, and then it is happy again.
> 
> 
> Output of iostat while problem is taking place:
> 
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.38    0.00    0.00   99.62    0.00    0.00
> 
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00     1.75    0.25    0.50     2.00    14.00
>  21.33    10.73 4300.00 1333.33 100.00
> sdb               0.00     1.75    0.50    0.25     4.00    28.00
>  42.67    10.10 3340.00 1333.33 100.00
> sdc               1.50     0.00    0.25    0.50    14.00     4.00
>  24.00     0.01   10.00  10.00   0.75
> md0               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> md1               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> md2               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> md3               0.00     0.00    0.25    2.75     2.00    22.00
> 8.00     0.00    0.00   0.00   0.00
> 
> 
> 
> 
> Output of iostat when problem is not taking place:
> 
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.62    0.00    4.62    6.50    0.00   88.25
> 
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda               4.50    14.25    3.75   13.00    66.00   234.00
>  17.91     0.08    4.93   4.63   7.75
> sdb               6.50    12.25    5.25   12.00    94.00   210.00
>  17.62     0.12    7.25   6.09  10.50
> sdc               7.25    11.00    4.75   12.25    96.00   202.00
>  17.53     0.10    5.88   4.85   8.25
> md0               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> md1               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> md2               0.00     0.00    0.00    1.50     0.00    12.00
> 8.00     0.00    0.00   0.00   0.00
> md3               0.00     0.00    1.75   33.00    14.00   264.00
> 8.00     0.00    0.00   0.00   0.00
> 
> 
> 
> 
> This suggests that either there's a problem with sda and sdb, or
> there's an issue with the SATA controller which is leaving both
> hanging. Annoying either way...
> 
> 
> 
> David


-- 
Daniel Reurich.

Centurion Computer Technology (2005) Ltd
Mobile 021 797 722





More information about the wellylug mailing list