Thanks Daniel that switch for vmstat is very handy and I&#39;d completely missed the io wait value in top.<div>Googling based on your comments also brought up this page which is very handy:</div><div><a href="http://strugglers.net/wiki/Linux_performance_tuning">http://strugglers.net/wiki/Linux_performance_tuning</a></div>

<div><br></div><div>The problem is certainly is looking like an intermittent I/O issue.</div><div><br></div><div><br class="Apple-interchange-newline">Has anyone experience with the performance boost of a dedicated PCI-X SATA controller for software RAID?</div>

<div><br></div><div>The server in question is a bog-standard HP ML110.</div><div>It isn&#39;t up to their needs, but it was a recent purchase by the previous IT guys, so I&#39;m afraid it is staying.</div><div><br></div><div>

For practical reasons I want to keep the software RAID-5 (3x1TB drives), but would putting some (or all) of these disks onto a dedicated controller alleviate the I/O issue?</div><div><br></div><div>i.e. Is it worth recommending a $400 PCI-X SATA controller for the box, or is that money better left on the table for a new server (ML310/330) in twelve months time?</div>

<div>(My concern being that the card goes in and the problem remains the same.)</div><div><br></div><div><br></div><div>David</div><div><br></div><div><br><br><div class="gmail_quote">On Mon, Mar 22, 2010 at 10:51 PM, Daniel Reurich <span dir="ltr">&lt;<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">The top is interesting as it shows an iowait of 87%<br>

<br>

for current state of vmstat you need to run it like this:<br>

# vmstat 1 10<br>

<br>

The first result is an average since the last reboot, and the following<br>

ones are the state every second for 10 samples.  (See the man page).<br>

<br>

My initial thoughts are that your raid array it may be caused be the<br>

periodic raid scrub (runs a raid check over the entire array), so i&#39;d<br>

check that via either &quot;cat /proc/mdstat&quot; or &quot;mdadm -E /dev/md*&quot;<br>

<br>

Also check that you don&#39;t have more than 1 disk at a time doing smart<br>

checks, as that really kills server performance.<br>

<font color="#888888"><br>

Daniel Reurich<br>

</font><div><div></div><div class="h5"><br>

<br>

On Mon, 2010-03-22 at 17:12 +1300, David Harrison wrote:<br>

&gt; I thought it was I/O bound too as it is running a software RAID5<br>

&gt; array.<br>

&gt; I would like it to be better, but the client can&#39;t afford hardware<br>

&gt; upgrades right now.<br>

<br>

&gt;<br>

&gt; Does 6% IO wait time (from vmstat) constitute really bad disk<br>

&gt; performance?<br>

&gt; I&#39;ve got systems with much higher wait times that have far lower<br>

&gt; loads.<br>

&gt;<br>

&gt;<br>

&gt; Here&#39;s the header output from top (the two active processes are<br>

&gt; registering 1% cpu load each):<br>

&gt;<br>

&gt; top - 17:05:54 up 2 days, 20:25,  1 user,  load average: 2.74, 1.17,<br>

&gt; 0.74<br>

&gt; Tasks:  71 total,   2 running,  69 sleeping,   0 stopped,   0 zombie<br>

&gt; Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 14.3%id, 85.4%wa,  0.0%hi,  0.1%<br>

&gt; si,  0.0%st<br>

&gt; Mem:   2074112k total,  2020768k used,    53344k free,    13128k<br>

&gt; buffers<br>

&gt; Swap:  3903608k total,      828k used,  3902780k free,  1770188k<br>

&gt; cached<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Here&#39;s the vmstat output:<br>

&gt; procs -----------memory---------- ---swap-- -----io---- -system--<br>

&gt; ----cpu----<br>

&gt;  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us<br>

&gt; sy id wa<br>

&gt;  0  0    828  53416  13228 1770304    0    0    12    22    0   24  0<br>

&gt;  4 90  6<br>

&gt;<br>

&gt;<br>

&gt; And finally iostat<br>

&gt; avg-cpu:  %user   %nice %system %iowait  %steal   %idle<br>

&gt;                   0.28     0.00        4.47      5.73      0.00<br>

&gt;  89.52<br>

&gt;<br>

&gt;<br>

&gt; Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn<br>

&gt; sda               4.12        22.55        56.06    5558094   13814570<br>

&gt; sdb               3.79        22.91        56.63    5646538   13955010<br>

&gt; sdc               4.06        23.30        56.80    5742544   13997936<br>

&gt; md0               0.00         0.02         0.00       3756         10<br>

&gt; md1               0.00         0.01         0.01       1592       2176<br>

&gt; md2               1.26         2.60         9.28     640370    2287456<br>

&gt; md3               9.40        46.87        58.95   11551194   14527312<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; On Mon, Mar 22, 2010 at 5:02 PM, Daniel Reurich<br>

&gt; &lt;<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>&gt; wrote:<br>

&gt;<br>

&gt;         On Mon, 2010-03-22 at 16:49 +1300, David Harrison wrote:<br>

&gt;         &gt; Hi,<br>

&gt;         &gt; Has anyone experienced high-load averages but haven&#39;t been<br>

&gt;         able to see<br>

&gt;         &gt; processes that are causing it?<br>

&gt;         &gt;<br>

&gt;         &gt;<br>

&gt;         &gt; I&#39;ve got an Ubuntu Server 9.10 instance who&#39;s load average<br>

&gt;         ranges<br>

&gt;         &gt; between 1.0 and 3.0 for most of the day, yet tools like top<br>

&gt;         and iostat<br>

&gt;         &gt; don&#39;t reveal any issues.<br>

&gt;         &gt; i.e. The load averages can be up around 1.5 whilst the<br>

&gt;         maximum process<br>

&gt;         &gt; viewed in top is sitting at 5% of the CPU.<br>

&gt;         &gt;<br>

&gt;         &gt;<br>

&gt;         &gt; Anyone know of any other good tools for identifying the<br>

&gt;         cause of<br>

&gt;         &gt; server load if the obvious ones fail?<br>

&gt;         &gt;<br>

&gt;<br>

&gt;         What&#39;s the wait state like (in top it&#39;s the %wa value).<br>

&gt;<br>

&gt;         Chances are that you have some serious I/O blocking going on<br>

&gt;         which could<br>

&gt;         be a slow or failing harddisk or something like that.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;         --<br>

&gt;         Daniel Reurich.<br>

&gt;<br>

&gt;         Centurion Computer Technology (2005) Ltd<br>

&gt;         Mobile 021 797 722<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;         --<br>

&gt;         Wellington Linux Users Group Mailing List:<br>

&gt;         <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>

&gt;         To Leave:<br>

&gt;          <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

<br>

<br>

</div></div>--<br>

<div><div></div><div class="h5">Daniel Reurich.<br>

<br>

Centurion Computer Technology (2005) Ltd<br>

Mobile 021 797 722<br>

<br>

<br>

<br>

<br>

--<br>

Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>

To Leave:  <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>

</div></div></blockquote></div><br></div>