Yes that could be a very real possibility.<div><br></div><div>A replacement server is being shipped up tomorrow, so by mid-next week it should be back here in Wellington where it can be better examined...</div><div><br><br>
<div class="gmail_quote">On Thu, Mar 25, 2010 at 10:42 AM, Daniel Reurich <span dir="ltr">&lt;<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Power supply not coping anymore (under spec&#39;d) or mainboard capacitors<br>
popped is my guess.<br>
<div><div></div><div class="h5"><br>
<br>
<br>
On Thu, 2010-03-25 at 07:51 +1300, David Harrison wrote:<br>
&gt; No, but now that you say that if the system is unable to write to the<br>
&gt; RAID5 which contains the log file would this even happen?<br>
&gt;<br>
&gt;<br>
&gt; e.g. /var is the problematic RAID5 partition and when it locks up it<br>
&gt; takes out one or more of the physical disks.<br>
&gt;<br>
&gt;<br>
&gt; An interesting observation is that when the problem occurs it either<br>
&gt; locks up both sda &amp; sdb, or sdc by itself.<br>
&gt; I am guessing that this is because sda &amp; sdb are on the same channel,<br>
&gt; so either the channel itself is going or one of the disks is which is<br>
&gt; taking the other with it.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; David<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Thu, Mar 25, 2010 at 12:14 AM, Daniel Reurich<br>
&gt; &lt;<a href="mailto:daniel@centurion.net.nz">daniel@centurion.net.nz</a>&gt; wrote:<br>
&gt;         Does anything show up in the syslog or dmesg that indicates<br>
&gt;         sata i/o<br>
&gt;         port resets or anything like that??<br>
&gt;<br>
&gt;         Daniel Reurich<br>
&gt;<br>
&gt;<br>
&gt;         On Wed, 2010-03-24 at 20:53 +1300, David Harrison wrote:<br>
&gt;         &gt; On Wed, Mar 24, 2010 at 6:36 PM, Daniel Pittman<br>
&gt;         &lt;<a href="mailto:daniel@rimspace.net">daniel@rimspace.net</a>&gt;<br>
&gt;         &gt; wrote:<br>
&gt;         &gt;         David Harrison &lt;<a href="mailto:david.harrison@stress-free.co.nz">david.harrison@stress-free.co.nz</a>&gt;<br>
&gt;         writes:<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;         &gt; I will try the deadline scheduler tonight and see<br>
&gt;         if that<br>
&gt;         &gt;         makes a<br>
&gt;         &gt;         &gt; difference.<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;         You should be able to make the change at run-time,<br>
&gt;         through<br>
&gt;         &gt;         sysfs, I believe.<br>
&gt;         &gt;         It is a property of the hardware devices, IIRC, in<br>
&gt;         sysfs.<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt; I tried out a few of the schedulers and none of them helped<br>
&gt;         the<br>
&gt;         &gt; problem.<br>
&gt;         &gt; If anything I&#39;d have to say it got worse.<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt; As a final test I have switched to the kernel that was<br>
&gt;         installed<br>
&gt;         &gt; originally by Ubuntu (2.6.24-24-server).<br>
&gt;         &gt; The problem still exists and I know for sure it didn&#39;t when<br>
&gt;         things<br>
&gt;         &gt; were first setup.<br>
&gt;         &gt; - There&#39;s just no way we could have migrated 400gig of data<br>
&gt;         onto the<br>
&gt;         &gt; RAID if it was this flakey.<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt; Whatever it is is hardware related, and it seems to be<br>
&gt;         getting worse<br>
&gt;         &gt; over time...<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt; David<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;<br>
&gt;         &gt; --<br>
&gt;         &gt; Wellington Linux Users Group Mailing List:<br>
&gt;         <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
&gt;         &gt; To Leave:<br>
&gt;          <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;         --<br>
&gt;         Daniel Reurich.<br>
&gt;<br>
&gt;         Centurion Computer Technology (2005) Ltd<br>
&gt;         Mobile 021 797 722<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;         --<br>
&gt;<br>
&gt;<br>
&gt;         Wellington Linux Users Group Mailing List:<br>
&gt;         <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
&gt;         To Leave:<br>
&gt;          <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
&gt; To Leave:  <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
<br>
<br>
<br>
<br>
--<br>
Wellington Linux Users Group Mailing List: <a href="mailto:wellylug@lists.wellylug.org.nz">wellylug@lists.wellylug.org.nz</a><br>
To Leave:  <a href="http://lists.wellylug.org.nz/mailman/listinfo/wellylug" target="_blank">http://lists.wellylug.org.nz/mailman/listinfo/wellylug</a><br>
</div></div></blockquote></div><br></div>