[wellylug] High load averages but no apparent cause

Mon Mar 22 22:51:06 NZDT 2010

The top is interesting as it shows an iowait of 87%

for current state of vmstat you need to run it like this:
# vmstat 1 10

The first result is an average since the last reboot, and the following
ones are the state every second for 10 samples.  (See the man page).

My initial thoughts are that your raid array it may be caused be the
periodic raid scrub (runs a raid check over the entire array), so i'd
check that via either "cat /proc/mdstat" or "mdadm -E /dev/md*"

Also check that you don't have more than 1 disk at a time doing smart
checks, as that really kills server performance.

Daniel Reurich

On Mon, 2010-03-22 at 17:12 +1300, David Harrison wrote:
> I thought it was I/O bound too as it is running a software RAID5
> array.
> I would like it to be better, but the client can't afford hardware
> upgrades right now.

> 
> Does 6% IO wait time (from vmstat) constitute really bad disk
> performance?
> I've got systems with much higher wait times that have far lower
> loads.
> 
> 
> Here's the header output from top (the two active processes are
> registering 1% cpu load each):
> 
> top - 17:05:54 up 2 days, 20:25,  1 user,  load average: 2.74, 1.17,
> 0.74
> Tasks:  71 total,   2 running,  69 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 14.3%id, 85.4%wa,  0.0%hi,  0.1%
> si,  0.0%st
> Mem:   2074112k total,  2020768k used,    53344k free,    13128k
> buffers
> Swap:  3903608k total,      828k used,  3902780k free,  1770188k
> cached
> 
> 
> 
> 
> Here's the vmstat output:
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy id wa
>  0  0    828  53416  13228 1770304    0    0    12    22    0   24  0
>  4 90  6
> 
> 
> And finally iostat
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   0.28     0.00        4.47      5.73      0.00
>  89.52
> 
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               4.12        22.55        56.06    5558094   13814570
> sdb               3.79        22.91        56.63    5646538   13955010
> sdc               4.06        23.30        56.80    5742544   13997936
> md0               0.00         0.02         0.00       3756         10
> md1               0.00         0.01         0.01       1592       2176
> md2               1.26         2.60         9.28     640370    2287456
> md3               9.40        46.87        58.95   11551194   14527312
> 
> 
> 
> 
> On Mon, Mar 22, 2010 at 5:02 PM, Daniel Reurich
> <daniel at centurion.net.nz> wrote:
>         
>         On Mon, 2010-03-22 at 16:49 +1300, David Harrison wrote:
>         > Hi,
>         > Has anyone experienced high-load averages but haven't been
>         able to see
>         > processes that are causing it?
>         >
>         >
>         > I've got an Ubuntu Server 9.10 instance who's load average
>         ranges
>         > between 1.0 and 3.0 for most of the day, yet tools like top
>         and iostat
>         > don't reveal any issues.
>         > i.e. The load averages can be up around 1.5 whilst the
>         maximum process
>         > viewed in top is sitting at 5% of the CPU.
>         >
>         >
>         > Anyone know of any other good tools for identifying the
>         cause of
>         > server load if the obvious ones fail?
>         >
>         
>         What's the wait state like (in top it's the %wa value).
>         
>         Chances are that you have some serious I/O blocking going on
>         which could
>         be a slow or failing harddisk or something like that.
>         
>         
>         
>         --
>         Daniel Reurich.
>         
>         Centurion Computer Technology (2005) Ltd
>         Mobile 021 797 722
>         
>         
>         
>         
>         
>         --
>         Wellington Linux Users Group Mailing List:
>         wellylug at lists.wellylug.org.nz
>         To Leave:
>          http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>         
> 
> 

-- 
Daniel Reurich.

Centurion Computer Technology (2005) Ltd
Mobile 021 797 722