[sf-lug] upgrading versus wiping and installing new

Rick Moen rick at linuxmafia.com
Mon Mar 10 22:51:28 PDT 2014


I wrote:
> Quoting Jim Stockford (jim at systemateka.com):

> > I've set up server boxes with multiple 2 GB swap partitions
> > distributed across the used storage. Red Hat docs of 2008 or so state
> > that swap parts greater than 2 GB are inefficient (or partly unusable,
> > I forget) because of kernel code design limits.
> 
> There's a funny story I'll have to tell you about that, some time.
> For now:  Yes, back _then_, there were real seriosu problems using swap
> partitions sized over 2GB per filesystem.

This story was around 2004, when I was working for California Digital
Corporation aka CDC, a small company in Fremont that had sort-of taken
over VA Linux Systems's hardware division when Larry Augustin decided to
abandon the hardware business and refocus the firm on proprietary
software.  (VA reportedly wasn't willing to let CDC have any of its
unused trademarks or customer information, but sold CDC its remaining
hardware inventory.)

One of our customers for 1U rackmount servers was a big EDA firm in the
South Bay, hereinafter 'Customer'.  Customer ordered big batches of dual
Xeon systems kickstarted to RHEL3, and was very clear on wanting only
RHEL-packaged software with no exceptiosn.   If the customer wanted an
8GB RAM machine, we'd typically kickstart it with a partition map on the
boot drive that included qty. eight of 2GB swap partitions.

Customer contacted us and said 'Why not a single 16GB swap partition?
Wouldn't that be at least iota faster and more efficient?'  We said
(paraphrasing):  'We are sticking to swap partitions no bigger than 2GB
out of conservative best practices.'  Customer ordered us to switch to
one big swap partition instead.  Customer's cash was green, and we
certainly weren't going to argue, so we adjusted kickstart.  Production
line ground out product, product delivered, wait a couple of weeks.

Oops, Customer is very unhappy.   The 1U boxes are seizing up requiring
reboot to gain control about every ~4 days.   They want to RMA
everything.  We go fetch a couple dozen examples, and run them through
the same Cerberus Test Control System (CTCS)[1] burn-in/torture-testing
suite we use to quality-check newly built units by putting them under
extreme stress for some days per unit.  (Manufactured units were,
critically, tested with a default RHEL build and only later
custom-kickstarted to Customer's specs.)

We found:[2]

1.  Units with a single 16GB swap partition exhausted RAM in 1 day.
2.  Units with two 8GB swap partitions exhausted RAM in 2 days.
3.  Units with four 4GB swap partitions exhausted RAM in 1.5 weeks.
4.  Units with eight 2GB swap partitions never exhausted RAM.

RHEL3 came with RH's vendor-patched 2.4 kernel with backported patches
from the 2.6 series, and this included quite a bit of patches for the VM
code.

We tried, experimentally, the same tests using a kernel.org vanilla 2.6
kernel that we compiled:  This never exhausted RAM regardless of
partitioning.

Customer asked for our conclusions.  We said the diplomatic version of
this:  Evidently by insisting on usig a rare swap configuration with
RH's custom 2.4 kernel, you triggered a bug in the patched VM code that
most major intensive-usage sites unconsciously avoid by going with the
traditional practice of limiting each swap partition to 2GB.  We had no
way of knowing this would happen, but our experience and instincts would
have lead us to veer away from a novel configuration of something as
important as swap on a high-traffic system, which would have avoided
triggering the bug.

Customer's obvious remedies were either:

o Use a non-vendor 2.6 kernel, or 
o Use 2GB swap partitions for now, while other people serve as the 
  pioneers and get the arrows in their backs.

Customer opted for choice #2.  Which means, finally letting us do what
we recommended in the first place.


[2] 'Burn-in' on http://linuxmafia.com/kb/Hardware
[2] Figures are inexact on account of time passed.






More information about the sf-lug mailing list