[wellylug] High load averages but no apparent cause

Tue Mar 23 08:50:43 NZDT 2010

Thanks Daniel that switch for vmstat is very handy and I'd completely missed
the io wait value in top.
Googling based on your comments also brought up this page which is very
handy:
http://strugglers.net/wiki/Linux_performance_tuning

The problem is certainly is looking like an intermittent I/O issue.

Has anyone experience with the performance boost of a dedicated PCI-X SATA
controller for software RAID?

The server in question is a bog-standard HP ML110.
It isn't up to their needs, but it was a recent purchase by the previous IT
guys, so I'm afraid it is staying.

For practical reasons I want to keep the software RAID-5 (3x1TB drives), but
would putting some (or all) of these disks onto a dedicated
controller alleviate the I/O issue?

i.e. Is it worth recommending a $400 PCI-X SATA controller for the box, or
is that money better left on the table for a new server (ML310/330) in
twelve months time?
(My concern being that the card goes in and the problem remains the same.)

David

On Mon, Mar 22, 2010 at 10:51 PM, Daniel Reurich <daniel at centurion.net.nz>wrote:

> The top is interesting as it shows an iowait of 87%
>
> for current state of vmstat you need to run it like this:
> # vmstat 1 10
>
> The first result is an average since the last reboot, and the following
> ones are the state every second for 10 samples.  (See the man page).
>
> My initial thoughts are that your raid array it may be caused be the
> periodic raid scrub (runs a raid check over the entire array), so i'd
> check that via either "cat /proc/mdstat" or "mdadm -E /dev/md*"
>
> Also check that you don't have more than 1 disk at a time doing smart
> checks, as that really kills server performance.
>
> Daniel Reurich
>
>
> On Mon, 2010-03-22 at 17:12 +1300, David Harrison wrote:
> > I thought it was I/O bound too as it is running a software RAID5
> > array.
> > I would like it to be better, but the client can't afford hardware
> > upgrades right now.
>
> >
> > Does 6% IO wait time (from vmstat) constitute really bad disk
> > performance?
> > I've got systems with much higher wait times that have far lower
> > loads.
> >
> >
> > Here's the header output from top (the two active processes are
> > registering 1% cpu load each):
> >
> > top - 17:05:54 up 2 days, 20:25,  1 user,  load average: 2.74, 1.17,
> > 0.74
> > Tasks:  71 total,   2 running,  69 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 14.3%id, 85.4%wa,  0.0%hi,  0.1%
> > si,  0.0%st
> > Mem:   2074112k total,  2020768k used,    53344k free,    13128k
> > buffers
> > Swap:  3903608k total,      828k used,  3902780k free,  1770188k
> > cached
> >
> >
> >
> >
> > Here's the vmstat output:
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> > sy id wa
> >  0  0    828  53416  13228 1770304    0    0    12    22    0   24  0
> >  4 90  6
> >
> >
> > And finally iostat
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >                   0.28     0.00        4.47      5.73      0.00
> >  89.52
> >
> >
> > Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> > sda               4.12        22.55        56.06    5558094   13814570
> > sdb               3.79        22.91        56.63    5646538   13955010
> > sdc               4.06        23.30        56.80    5742544   13997936
> > md0               0.00         0.02         0.00       3756         10
> > md1               0.00         0.01         0.01       1592       2176
> > md2               1.26         2.60         9.28     640370    2287456
> > md3               9.40        46.87        58.95   11551194   14527312
> >
> >
> >
> >
> > On Mon, Mar 22, 2010 at 5:02 PM, Daniel Reurich
> > <daniel at centurion.net.nz> wrote:
> >
> >         On Mon, 2010-03-22 at 16:49 +1300, David Harrison wrote:
> >         > Hi,
> >         > Has anyone experienced high-load averages but haven't been
> >         able to see
> >         > processes that are causing it?
> >         >
> >         >
> >         > I've got an Ubuntu Server 9.10 instance who's load average
> >         ranges
> >         > between 1.0 and 3.0 for most of the day, yet tools like top
> >         and iostat
> >         > don't reveal any issues.
> >         > i.e. The load averages can be up around 1.5 whilst the
> >         maximum process
> >         > viewed in top is sitting at 5% of the CPU.
> >         >
> >         >
> >         > Anyone know of any other good tools for identifying the
> >         cause of
> >         > server load if the obvious ones fail?
> >         >
> >
> >         What's the wait state like (in top it's the %wa value).
> >
> >         Chances are that you have some serious I/O blocking going on
> >         which could
> >         be a slow or failing harddisk or something like that.
> >
> >
> >
> >         --
> >         Daniel Reurich.
> >
> >         Centurion Computer Technology (2005) Ltd
> >         Mobile 021 797 722
> >
> >
> >
> >
> >
> >         --
> >         Wellington Linux Users Group Mailing List:
> >         wellylug at lists.wellylug.org.nz
> >         To Leave:
> >          http://lists.wellylug.org.nz/mailman/listinfo/wellylug
> >
> >
> >
>
>
> --
> Daniel Reurich.
>
> Centurion Computer Technology (2005) Ltd
> Mobile 021 797 722
>
>
>
>
> --
> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
> To Leave:  http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wellylug.org.nz/pipermail/wellylug/attachments/20100323/302bf597/attachment-0001.htm