<div>Just a follow up, this thread describes my problem exactly:<div><a href="http://centos.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=22554&forum=37">http://centos.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=22554&forum=37</a></div>
<div><br></div><div>The short story is that even though smartctl reports no issue there is probably a hardware issue.</div><div><br></div><div><br></div><div>Below is the output from iostat on the server when the disk problem is taking place.</div>
<div>The utilisation of sda and sdb are 100% even though they are hardly doing anything.</div><div>This lockup remains for 15-20 seconds until something in the kernel/hardware resets, and then it is happy again.</div><div>
<br></div><div><b>Output of iostat while problem is taking place:</b></div><div><br></div><div><div>avg-cpu: %user %nice %system %iowait %steal %idle</div><div> 0.38 0.00 0.00 99.62 0.00 0.00</div>
<div><br></div><div>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util</div><div>sda 0.00 1.75 0.25 0.50 2.00 14.00 21.33 10.73 4300.00 1333.33 100.00</div>
<div>sdb 0.00 1.75 0.50 0.25 4.00 28.00 42.67 10.10 3340.00 1333.33 100.00</div><div>sdc 1.50 0.00 0.25 0.50 14.00 4.00 24.00 0.01 10.00 10.00 0.75</div>
<div>md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00</div><div>md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00</div>
<div>md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00</div><div>md3 0.00 0.00 0.25 2.75 2.00 22.00 8.00 0.00 0.00 0.00 0.00</div>
<div><br></div><div><br></div><div><b>Output of iostat when problem is not taking place:</b></div><div><br></div><div><div>avg-cpu: %user %nice %system %iowait %steal %idle</div><div> 0.62 0.00 4.62 6.50 0.00 88.25</div>
<div><br></div><div>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util</div><div>sda 4.50 14.25 3.75 13.00 66.00 234.00 17.91 0.08 4.93 4.63 7.75</div>
<div>sdb 6.50 12.25 5.25 12.00 94.00 210.00 17.62 0.12 7.25 6.09 10.50</div><div>sdc 7.25 11.00 4.75 12.25 96.00 202.00 17.53 0.10 5.88 4.85 8.25</div>
<div>md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00</div><div>md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00</div>
<div>md2 0.00 0.00 0.00 1.50 0.00 12.00 8.00 0.00 0.00 0.00 0.00</div><div>md3 0.00 0.00 1.75 33.00 14.00 264.00 8.00 0.00 0.00 0.00 0.00</div>
</div><div><br></div><div><br></div><div>This suggests that either there's a problem with sda and sdb, or there's an issue with the SATA controller which is leaving both hanging. Annoying either way...</div><div><br>
</div><div><br>David</div></div></div><div><br></div><br><br><div class="gmail_quote">On Wed, Mar 24, 2010 at 8:27 AM, David Harrison <span dir="ltr"><<a href="mailto:david.harrison@stress-free.co.nz">david.harrison@stress-free.co.nz</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi,<div>I ran the smartctl tests (both short and long) on all three physical drives overnight.</div><div>It showed all drives were working 100% correctly.</div>
<div><br></div><div>Overnight I also ran a number of read/write tests and monitored the i/o status in vmstat and iostat.</div>
<div><br></div><div>It seems like performance falls through the floor as soon as the physical memory on the server is exhausted.</div><div><br></div><div>The issue I am experiencing seems to be very similar to the issue which is documented here:</div>
<div><a href="http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html" target="_blank">http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html</a></div><div><br></div><div>I've checked the kernel parameters that are mentioned in this article (dirty_ratio and dirty_background_ratio) and they are the values that are recommended.</div>
<div><br></div><div>Putting more RAM in the machine will certainly forestall the issue, but beyond that it maybe a case of trying RAID1 instead of RAID5.</div><div><br></div><font color="#888888"><div><br>David</div></font><div>
<div></div><div class="h5"><div><br></div><div><br><br>
<div class="gmail_quote">On Tue, Mar 23, 2010 at 9:42 AM, David Harrison <span dir="ltr"><<a href="mailto:david.harrison@stress-free.co.nz" target="_blank">david.harrison@stress-free.co.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Cheers Daniel.<div>Will do tonight out of hours.</div><div><br></div><div>I hope it isn't one of the drives, they are brand new and the servers in Auckland whilst I am here in Wellington...</div><div><br></div><font color="#888888"><div>
<br>
David</div></font><div><div></div><div><div><br><br><div class="gmail_quote">On Tue, Mar 23, 2010 at 9:20 AM, Daniel Reurich <span dir="ltr"><<a href="mailto:daniel@centurion.net.nz" target="_blank">daniel@centurion.net.nz</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On Tue, 2010-03-23 at 08:50 +1300, David Harrison wrote:<br>
> Thanks Daniel that switch for vmstat is very handy and I'd completely<br>
> missed the io wait value in top.<br>
> Googling based on your comments also brought up this page which is<br>
> very handy:<br>
> <a href="http://strugglers.net/wiki/Linux_performance_tuning" target="_blank">http://strugglers.net/wiki/Linux_performance_tuning</a><br>
><br>
><br>
> The problem is certainly is looking like an intermittent I/O issue.<br>
><br>
</div>Probably caused by a disk issue - seriously. I have seen this before.<br>
<div>><br>
> Has anyone experience with the performance boost of a dedicated PCI-X<br>
> SATA controller for software RAID?<br>
<br>
><br>
> The server in question is a bog-standard HP ML110.<br>
> It isn't up to their needs, but it was a recent purchase by the<br>
> previous IT guys, so I'm afraid it is staying.<br>
><br>
</div>I don't think it should be an issue.<br>
<div>><br>
> For practical reasons I want to keep the software RAID-5 (3x1TB<br>
> drives), but would putting some (or all) of these disks onto a<br>
> dedicated controller alleviate the I/O issue?<br>
><br>
</div>I'd say it would provide none to minimal gain given your load stats if<br>
you are using software raid, and a small gain if you get hardware raid.<br>
<div>><br>
> i.e. Is it worth recommending a $400 PCI-X SATA controller for the<br>
> box, or is that money better left on the table for a new server<br>
> (ML310/330) in twelve months time?<br>
> (My concern being that the card goes in and the problem remains the<br>
> same.)<br>
<br>
</div>Save the money unless you can be sure it's the built in controller (I<br>
doubt it is).<br>
<br>
I think your next port of call is to d a SMART check on your drives.<br>
Install smartmontools and do a smartctl -son /dev/sdX for each drive,<br>
and report that back if you don't understand the results.<br>
<br>
Regards,<br>
Daniel.<br>
<font color="#888888"><br>
<br>
<br>
<br>
<br>
--<br>
</font><div><div></div><div>Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz" target="_blank">wellylug@lists.wellylug.org.nz</a><br>
To Leave: <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br>