[conspire] One way to test system RAM

Tue Jan 23 21:16:23 PST 2007

Just following up on posts of week or so ago.  Here are some pointers on
what was _significant_ in the screen output I posted:

> [...] I faced a pleasantly unfamiliar problem:  How do you
> torture-test a Linux server with prodigious (by my rather laughable
> standards) amounts of RAM?
> 
> This is what I came up with:
> 
> # cd /usr/src/linux-source-2.6.16
> # while : ; do make clean && make -j 256 ; done
                                    ^^^^^^

In other words, perform that (iterative) kernel compile in the form of
256 simultaneous, parallel jobs.  How did I decide on 256?  Excellent
question.  Keep reading, since it's an important matter, and since
initially the answer was non-obvious.

At first, I started with "-j 4", which normally would absolutely max out
the pitiful amount of RAM typical on _my_ machines.  However, since I'd
repeatedly had problems making _sure_ all RAM on this particular box was
being thoroughly tested, I wanted to make absolutely sure, on that
point.  I reasoned that all RAM would be, for certain, being under heavy
stress if "vmstat" (covered below) showed positive signs of
swap-in/swap-out activity.

> How's it doin'?  Nice of you to ask.  About like this:
> 
> # free
>              total       used       free     shared    buffers   cached
> Mem:       1556276    1515096      41180          0       1644   26752
                                     ^^^^^                  ^^^^
> -/+ buffers/cache:    1486700      69576
> Swap:      1469820     249480    1220340

Figures reported are in kilobytes.  The "free" number is a good sign 
in this context (since I was trying for heavy memory usage) -- 41 MB
free out of 1.5 GB total physical RAM -- but the real clincher is the
mere 1 MB of buffers, meaning the kernel was so starved for unused
RAM that it was unable to spare more than a pittance for that purpose.

> # uptime 
>  22:33:15 up  5:06,  3 users,  load average: 305.90, 308.94, 304.04
                                               ^^^^^^^^^^^^^^^^^^^^^^

A load average of 305 means the number of processes that, on average,
were waiting to be serviced (over the past 1, 5, and 15 minutes).  A
typical Linux server will have a load average of around _0.2_.  300+
means it's absolutely creaking at the seams -- molto grandissimo busy.  

Normally, if you see load average shooting up past 100 or so, you start
to worry whether the machine's going to fall over from runaway load, and
you're then not surprised if console commands start garnering very, very
slow feedback.  You can't necessarily count on a machine climbing its 
way back down from a 100+ load:  Sometimes, you're forced to reboot.

> #  ps auxw | wc -l
> 1098

There were just shy of 1100 active processes.  ("wc -l" uses the WordCount
utility, and tells it to count lines.)  For context, my very busy
current production server has only 91 processes alive, at this moment.

> # vmstat 4
> procs -----------memory---------- ---swap-- -----io---- -system-- 
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs 
> 285 10 337012 110908   1968  27516  785 1032   838  1078  302  138
                                      ^^^^^^^^
> 350 11 324944 137436   1968  27704 2425    0  2446    63  354  239
> 351 23 317644 156884   1980  27860  511    0   529     9  278  174
> 337 22 315668 158884   1984  28048  466    0   490    17  277  129
> 338 11 306500 183596   1992  28144 2340    0  2349     4  349  247

And this (vmstat = virtual memory statistics utility, with an initial
report and then four updates) was my bellwether I'd kept checking, as I
jacked up "make -j" from 4 to 8, 16, 32, 64, 128, and finally 256 before
_finally_ seeing swap activity in the "si" (swap in) and "so" (swap out)
columns.

Thus my point, about making _sure_ you're exercising all the RAM.

> Dunno about you, but that puppy strikes me as a bit _busy_.  ;->

The "while" loop guaranteed that compilation would keep going
indefinitely, since "while :" is just a fancy way of saying "While 1=1",
or like that.  On this occasion, running this very punishing system test
overnight was more than sufficient.