Thanks Daniel that switch for vmstat is very handy and I'd completely missed the io wait value in top.<div>Googling based on your comments also brought up this page which is very handy:</div><div><a href="http://strugglers.net/wiki/Linux_performance_tuning">http://strugglers.net/wiki/Linux_performance_tuning</a></div>
<div><br></div><div>The problem is certainly is looking like an intermittent I/O issue.</div><div><br></div><div><br class="Apple-interchange-newline">Has anyone experience with the performance boost of a dedicated PCI-X SATA controller for software RAID?</div>
<div><br></div><div>The server in question is a bog-standard HP ML110.</div><div>It isn't up to their needs, but it was a recent purchase by the previous IT guys, so I'm afraid it is staying.</div><div><br></div><div>
For practical reasons I want to keep the software RAID-5 (3x1TB drives), but would putting some (or all) of these disks onto a dedicated controller alleviate the I/O issue?</div><div><br></div><div>i.e. Is it worth recommending a $400 PCI-X SATA controller for the box, or is that money better left on the table for a new server (ML310/330) in twelve months time?</div>
<div>(My concern being that the card goes in and the problem remains the same.)</div><div><br></div><div><br></div><div>David</div><div><br></div><div><br><br><div class="gmail_quote">On Mon, Mar 22, 2010 at 10:51 PM, Daniel Reurich <span dir="ltr"><<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">The top is interesting as it shows an iowait of 87%<br>
<br>
for current state of vmstat you need to run it like this:<br>
# vmstat 1 10<br>
<br>
The first result is an average since the last reboot, and the following<br>
ones are the state every second for 10 samples. (See the man page).<br>
<br>
My initial thoughts are that your raid array it may be caused be the<br>
periodic raid scrub (runs a raid check over the entire array), so i'd<br>
check that via either "cat /proc/mdstat" or "mdadm -E /dev/md*"<br>
<br>
Also check that you don't have more than 1 disk at a time doing smart<br>
checks, as that really kills server performance.<br>
<font color="#888888"><br>
Daniel Reurich<br>
</font><div><div></div><div class="h5"><br>
<br>
On Mon, 2010-03-22 at 17:12 +1300, David Harrison wrote:<br>
> I thought it was I/O bound too as it is running a software RAID5<br>
> array.<br>
> I would like it to be better, but the client can't afford hardware<br>
> upgrades right now.<br>
<br>
><br>
> Does 6% IO wait time (from vmstat) constitute really bad disk<br>
> performance?<br>
> I've got systems with much higher wait times that have far lower<br>
> loads.<br>
><br>
><br>
> Here's the header output from top (the two active processes are<br>
> registering 1% cpu load each):<br>
><br>
> top - 17:05:54 up 2 days, 20:25, 1 user, load average: 2.74, 1.17,<br>
> 0.74<br>
> Tasks: 71 total, 2 running, 69 sleeping, 0 stopped, 0 zombie<br>
> Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 14.3%id, 85.4%wa, 0.0%hi, 0.1%<br>
> si, 0.0%st<br>
> Mem: 2074112k total, 2020768k used, 53344k free, 13128k<br>
> buffers<br>
> Swap: 3903608k total, 828k used, 3902780k free, 1770188k<br>
> cached<br>
><br>
><br>
><br>
><br>
> Here's the vmstat output:<br>
> procs -----------memory---------- ---swap-- -----io---- -system--<br>
> ----cpu----<br>
> r b swpd free buff cache si so bi bo in cs us<br>
> sy id wa<br>
> 0 0 828 53416 13228 1770304 0 0 12 22 0 24 0<br>
> 4 90 6<br>
><br>
><br>
> And finally iostat<br>
> avg-cpu: %user %nice %system %iowait %steal %idle<br>
> 0.28 0.00 4.47 5.73 0.00<br>
> 89.52<br>
><br>
><br>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn<br>
> sda 4.12 22.55 56.06 5558094 13814570<br>
> sdb 3.79 22.91 56.63 5646538 13955010<br>
> sdc 4.06 23.30 56.80 5742544 13997936<br>
> md0 0.00 0.02 0.00 3756 10<br>
> md1 0.00 0.01 0.01 1592 2176<br>
> md2 1.26 2.60 9.28 640370 2287456<br>
> md3 9.40 46.87 58.95 11551194 14527312<br>
><br>
><br>
><br>
><br>
> On Mon, Mar 22, 2010 at 5:02 PM, Daniel Reurich<br>
> <<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>> wrote:<br>
><br>
> On Mon, 2010-03-22 at 16:49 +1300, David Harrison wrote:<br>
> > Hi,<br>
> > Has anyone experienced high-load averages but haven't been<br>
> able to see<br>
> > processes that are causing it?<br>
> ><br>
> ><br>
> > I've got an Ubuntu Server 9.10 instance who's load average<br>
> ranges<br>
> > between 1.0 and 3.0 for most of the day, yet tools like top<br>
> and iostat<br>
> > don't reveal any issues.<br>
> > i.e. The load averages can be up around 1.5 whilst the<br>
> maximum process<br>
> > viewed in top is sitting at 5% of the CPU.<br>
> ><br>
> ><br>
> > Anyone know of any other good tools for identifying the<br>
> cause of<br>
> > server load if the obvious ones fail?<br>
> ><br>
><br>
> What's the wait state like (in top it's the %wa value).<br>
><br>
> Chances are that you have some serious I/O blocking going on<br>
> which could<br>
> be a slow or failing harddisk or something like that.<br>
><br>
><br>
><br>
> --<br>
> Daniel Reurich.<br>
><br>
> Centurion Computer Technology (2005) Ltd<br>
> Mobile 021 797 722<br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> Wellington Linux Users Group Mailing List:<br>
> <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
> To Leave:<br>
> <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
><br>
><br>
><br>
<br>
<br>
</div></div>--<br>
<div><div></div><div class="h5">Daniel Reurich.<br>
<br>
Centurion Computer Technology (2005) Ltd<br>
Mobile 021 797 722<br>
<br>
<br>
<br>
<br>
--<br>
Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
To Leave: <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
</div></div></blockquote></div><br></div>