[conspire] One way to test system RAM
Rick Moen
rick at linuxmafia.com
Tue Jan 23 21:16:23 PST 2007
Just following up on posts of week or so ago. Here are some pointers on
what was _significant_ in the screen output I posted:
> [...] I faced a pleasantly unfamiliar problem: How do you
> torture-test a Linux server with prodigious (by my rather laughable
> standards) amounts of RAM?
>
> This is what I came up with:
>
> # cd /usr/src/linux-source-2.6.16
> # while : ; do make clean && make -j 256 ; done
^^^^^^
In other words, perform that (iterative) kernel compile in the form of
256 simultaneous, parallel jobs. How did I decide on 256? Excellent
question. Keep reading, since it's an important matter, and since
initially the answer was non-obvious.
At first, I started with "-j 4", which normally would absolutely max out
the pitiful amount of RAM typical on _my_ machines. However, since I'd
repeatedly had problems making _sure_ all RAM on this particular box was
being thoroughly tested, I wanted to make absolutely sure, on that
point. I reasoned that all RAM would be, for certain, being under heavy
stress if "vmstat" (covered below) showed positive signs of
swap-in/swap-out activity.
> How's it doin'? Nice of you to ask. About like this:
>
> # free
> total used free shared buffers cached
> Mem: 1556276 1515096 41180 0 1644 26752
^^^^^ ^^^^
> -/+ buffers/cache: 1486700 69576
> Swap: 1469820 249480 1220340
Figures reported are in kilobytes. The "free" number is a good sign
in this context (since I was trying for heavy memory usage) -- 41 MB
free out of 1.5 GB total physical RAM -- but the real clincher is the
mere 1 MB of buffers, meaning the kernel was so starved for unused
RAM that it was unable to spare more than a pittance for that purpose.
> # uptime
> 22:33:15 up 5:06, 3 users, load average: 305.90, 308.94, 304.04
^^^^^^^^^^^^^^^^^^^^^^
A load average of 305 means the number of processes that, on average,
were waiting to be serviced (over the past 1, 5, and 15 minutes). A
typical Linux server will have a load average of around _0.2_. 300+
means it's absolutely creaking at the seams -- molto grandissimo busy.
Normally, if you see load average shooting up past 100 or so, you start
to worry whether the machine's going to fall over from runaway load, and
you're then not surprised if console commands start garnering very, very
slow feedback. You can't necessarily count on a machine climbing its
way back down from a 100+ load: Sometimes, you're forced to reboot.
> # ps auxw | wc -l
> 1098
There were just shy of 1100 active processes. ("wc -l" uses the WordCount
utility, and tells it to count lines.) For context, my very busy
current production server has only 91 processes alive, at this moment.
> # vmstat 4
> procs -----------memory---------- ---swap-- -----io---- -system--
> r b swpd free buff cache si so bi bo in cs
> 285 10 337012 110908 1968 27516 785 1032 838 1078 302 138
^^^^^^^^
> 350 11 324944 137436 1968 27704 2425 0 2446 63 354 239
> 351 23 317644 156884 1980 27860 511 0 529 9 278 174
> 337 22 315668 158884 1984 28048 466 0 490 17 277 129
> 338 11 306500 183596 1992 28144 2340 0 2349 4 349 247
And this (vmstat = virtual memory statistics utility, with an initial
report and then four updates) was my bellwether I'd kept checking, as I
jacked up "make -j" from 4 to 8, 16, 32, 64, 128, and finally 256 before
_finally_ seeing swap activity in the "si" (swap in) and "so" (swap out)
columns.
Thus my point, about making _sure_ you're exercising all the RAM.
> Dunno about you, but that puppy strikes me as a bit _busy_. ;->
The "while" loop guaranteed that compilation would keep going
indefinitely, since "while :" is just a fancy way of saying "While 1=1",
or like that. On this occasion, running this very punishing system test
overnight was more than sufficient.
More information about the conspire
mailing list