[wellylug] High load averages but no apparent cause

Wed Mar 24 14:01:51 NZDT 2010

David Harrison <david.harrison at stress-free.co.nz> writes:

> I ran the smartctl tests (both short and long) on all three physical drives
> overnight.  It showed all drives were working 100% correctly.
>
> Overnight I also ran a number of read/write tests and monitored the i/o
> status in vmstat and iostat.
>
> It seems like performance falls through the floor as soon as the physical
> memory on the server is exhausted.
>
> The issue I am experiencing seems to be very similar to the issue which is
> documented here:
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

If I recall correctly, and I may not, there was a known issue on some older
kernels where the I/O scheduler introduced long stalls.  It was a bug in the
CFQ scheduler code, IIRC, which is why tuning the write periods, changing to
another schedule like AS or deadline, or using a newer kernel would resolve
it.

> I've checked the kernel parameters that are mentioned in this article
> (dirty_ratio and dirty_background_ratio) and they are the values that are
> recommended.

You might try another I/O scheduler and see if it helped.  A newer kernel, if
your distribution has one, is another possible path.

> Putting more RAM in the machine will certainly forestall the issue, but
> beyond that it maybe a case of trying RAID1 instead of RAID5.

FWIW, I don't see this sort of behaviour on machines with MD RAID5 or RAID6.

They are otherwise quite different to (my understanding of) your system
configuration, so this just adds the data point that it isn't universal to all
uses of those tools.

        Daniel
-- 
✣ Daniel Pittman            ✉ daniel at rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons