[wellylug] High load averages but no apparent cause
David Harrison
david.harrison at stress-free.co.nz
Sun Mar 28 20:33:19 NZDT 2010
Just an update, the replacement server arrived in Auckland on Friday and all
data and services have been migrated.
The migration went without a hitch because it appears the RAID5 lockup only
occurs during write operations.
i.e. 500gb of data was pulled from the device without a single lock-up.
The migration did reveal something about Ubuntu's default RAID5
configuration, it is very poorly tuned :-)
For anyone using Ubuntu and RAID5 I recommend you checkout the following
couple of pages and set your stripe_cache_size accordingly:
http://randomitblog.blogspot.com/2009/10/ubuntu-raid-tweak.html
http://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
By changing the stripe_cache_size parameter to 16meg I saw a 10x improvement
in write operations.
David
On Thu, Mar 25, 2010 at 10:50 AM, David Harrison <
david.harrison at stress-free.co.nz> wrote:
> Yes that could be a very real possibility.
>
> A replacement server is being shipped up tomorrow, so by mid-next week it
> should be back here in Wellington where it can be better examined...
>
>
> On Thu, Mar 25, 2010 at 10:42 AM, Daniel Reurich <daniel at centurion.net.nz>wrote:
>
>> Power supply not coping anymore (under spec'd) or mainboard capacitors
>> popped is my guess.
>>
>>
>>
>> On Thu, 2010-03-25 at 07:51 +1300, David Harrison wrote:
>> > No, but now that you say that if the system is unable to write to the
>> > RAID5 which contains the log file would this even happen?
>> >
>> >
>> > e.g. /var is the problematic RAID5 partition and when it locks up it
>> > takes out one or more of the physical disks.
>> >
>> >
>> > An interesting observation is that when the problem occurs it either
>> > locks up both sda & sdb, or sdc by itself.
>> > I am guessing that this is because sda & sdb are on the same channel,
>> > so either the channel itself is going or one of the disks is which is
>> > taking the other with it.
>> >
>> >
>> >
>> > David
>> >
>> >
>> >
>> >
>> > On Thu, Mar 25, 2010 at 12:14 AM, Daniel Reurich
>> > <daniel at centurion.net.nz> wrote:
>> > Does anything show up in the syslog or dmesg that indicates
>> > sata i/o
>> > port resets or anything like that??
>> >
>> > Daniel Reurich
>> >
>> >
>> > On Wed, 2010-03-24 at 20:53 +1300, David Harrison wrote:
>> > > On Wed, Mar 24, 2010 at 6:36 PM, Daniel Pittman
>> > <daniel at rimspace.net>
>> > > wrote:
>> > > David Harrison <david.harrison at stress-free.co.nz>
>> > writes:
>> > >
>> > >
>> > > > I will try the deadline scheduler tonight and see
>> > if that
>> > > makes a
>> > > > difference.
>> > >
>> > >
>> > > You should be able to make the change at run-time,
>> > through
>> > > sysfs, I believe.
>> > > It is a property of the hardware devices, IIRC, in
>> > sysfs.
>> > >
>> > >
>> > >
>> > >
>> > > I tried out a few of the schedulers and none of them helped
>> > the
>> > > problem.
>> > > If anything I'd have to say it got worse.
>> > >
>> > >
>> > > As a final test I have switched to the kernel that was
>> > installed
>> > > originally by Ubuntu (2.6.24-24-server).
>> > > The problem still exists and I know for sure it didn't when
>> > things
>> > > were first setup.
>> > > - There's just no way we could have migrated 400gig of data
>> > onto the
>> > > RAID if it was this flakey.
>> > >
>> > >
>> > > Whatever it is is hardware related, and it seems to be
>> > getting worse
>> > > over time...
>> > >
>> > >
>> > >
>> > >
>> > > David
>> > >
>> > >
>> >
>> > > --
>> > > Wellington Linux Users Group Mailing List:
>> > wellylug at lists.wellylug.org.nz
>> > > To Leave:
>> > http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>> >
>> >
>> >
>> > --
>> > Daniel Reurich.
>> >
>> > Centurion Computer Technology (2005) Ltd
>> > Mobile 021 797 722
>> >
>> >
>> >
>> >
>> > --
>> >
>> >
>> > Wellington Linux Users Group Mailing List:
>> > wellylug at lists.wellylug.org.nz
>> > To Leave:
>> > http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>> >
>> >
>> >
>> > --
>> > Wellington Linux Users Group Mailing List:
>> wellylug at lists.wellylug.org.nz
>> > To Leave: http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>>
>>
>>
>>
>> --
>> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
>> To Leave: http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wellylug.org.nz/pipermail/wellylug/attachments/20100328/1417b9d8/attachment.htm>
More information about the wellylug
mailing list