[conspire] looking for good FAQ Websites for CPU heatsink/Fan hardware

Rick Moen rick at linuxmafia.com
Fri Apr 4 03:16:08 PDT 2008


Quoting Eric De MUND (ead-conspire at ixian.com):

> Speeze CPU fans are the only ones I recommended and used myself. Origi-
> nally, it might even have been on the recommendation of CABALists Ross
> or Rick.

Probably Ross.  I've not bought a separate CPU fan in a long while -- for
one of my own systems, anyway.  I guess I'd look for one with
significant fins / heatsink (conductive effect) and _try_ to get a fan
with decent bearings, i.e., _not_ the usual cheap sleeve bearings.  Be
aware that you need to make sure there's sufficient physical clearance
(above and around the CPU) for whatever you buy.

I've seen what happens if you _don't_ use proper CPU fans:  I got flown
out, a few years ago, to Rochester, NY to inspect a server farm of
dual-Opteron 1U rackmount servers that were reported to have "problems".  
That wasn't the half of it:  It turned out that about half the units had
gone down because of failed CPU fans (seized up -- i.e., because of
those damned cheap bearings).  When that happened, some of the units' 
Opterons merely failed.  Those were the lucky ones.  On the unlucky
ones, the motherboards were actually _charred_ in the vicinity of the
burned-out CPUs.

So, the guys who flew me out (who shall go nameless) had to pay for
about a hundred or so replacement Opteron CPUs, and an equally
impressive number of Tyan server motherboards -- plus my time to fix
them all.


[The problem of bad RAM:]

> Though I've found memtest86 to be a superbly useful tool, getting myself
> to actually suspect RAM problems has sometimes been a tricky hump to get
> over. 
[...]
> I've heard tell that some folks have used prime95 for this purpose, but
> I don't know that I ever knew the particulars. 

I've actually never used prime95.  (In fact, I'm not certain I recognise
the name.)  Now, memtest86 / memtest86+ are the old standbys --
_mostly_ -- that you use in those famous situations when you seriously
suspect there's bad RAM.  It's important to know two things about that
pair of programs:  One, you cannot just run them for 1-2 hours and
come to any worthwhile conclusions:  You _must_ let them run overnight,
and only then read the results.  Two, be aware that neither memtest86
nor memtest86+ will _always_ find bad RAM.  On rare occasions, bad RAM
simply eludes them.  I'm not sure why.

Iterative compilation of the Linux kernel always, always, _always_ finds
bad RAM -- though you might have to use "make -j" directives to ensure
that you are _fully_ exercising all RAM, rather than just the lower
portion of RAM.

I've written about that matter on Conspire, before.  Because I knew the
matter would come up again, I eventually got around to putting the URLs
of that thread on my personal page, http://linuxmafia.com/~rick/ .
Those links, FYI, are:

http://linuxmafia.com/pipermail/conspire/2006-December/002662.html
http://linuxmafia.com/pipermail/conspire/2006-December/002668.html
http://linuxmafia.com/pipermail/conspire/2007-January/002743.html

If, by contrast, you don't _specifically_ suspect bad RAM, but are
merely seeking to stress-test a machine's hardware, I still cannot find
anything I'd recommend about VA Linux Systems's Cerberus Test Control
System, which I discuss on:  "Burn-in" on
http://linuxmafia.com/kb/Hardware/ .

That suite of tests that are run in parallel, which _includes_ tests
intended to stress-test RAM but also a number of others, was what VA
Linux Systems used to vet both new machines just received from the
assembly line and customer machines repaired under warranty (or
otherwise), to make sure they're OK by putting them under extremely
heavy load and monitoring them for errors.




I just felt that mem-
> test86 was better, given that its tests were designed to induce memory
> failures based on the various types of dynamic failure modes. Prior to
> reading memtest86's documentation, I'd only known of static failures.
> By analogy, I thought that if memory locations were squares on a chess-
> board, bad memory was simply a particular square not being able to hold
> its state properly. Not so; there are particular dynamic failures, such
> as, "When these particular eight squares go from being empty to each
> holding a white pawn, then the eighth square frequently fails to ''hold
> its pawn'', instead stubbornly remaining empty. Whereas in all other
> cases, that square holds its state just fine."
> 
> [several minutes later]
> 
> Ok. The thread at:
> 
> o   How to guide: Memtest86+, Prime95, and SP2004
>     http://forums.anandtech.com/messageview.aspx?catid=28&threadid=1901991&enterthread=y
> 
> includes folks speaking about using Prime95 and SP2004 for general
> stability testing, which rings a bell. Though memtest86 was great for
> RAM testing, the second part of the equation when dealing with iffy
> hardware was overall stability testing. That is, not just memory but
> CPU, too. This involves stressing the CPU and hopefully not seeing any
> squirrelly behavior in response. Prime95, I now recall, was good for
> this.
> 
> Note that this anandtech.com thread was simply the first one that I came
> across in a Google search; I'm sure there are thousands of such useful
> threads elsewhere on the web. This one just happens to report tests that
> I myself recall putting systems through a couple of years back.
> 
> David Fox:
> ] I noticed that your case does not contain any fans. I don't know if
> ] they can be added after purchase, or if the motherboard has fan
> ] headers (those are the little plugs that you plug power connections
> ] onto the motherboard).
> 
> If your case has spaces for case fans, I highly recommend them. Cooler
> is better; cooler air translates to longer life of parts. Enermax 80mm
> fans have worked very, very well for me. Again, these might have been on
> the recommendation of Rick or Ross, way back when.
> 
> By the way, the Enermax case fans [1], and the Speeze CPU fan [2] (all
> from newegg.com) that I put in my girlfriend's system back in November
> 2003 are all still going strong. And all are very quiet.
> 
> Regards,
> Eric
> 
> [1] Enermax Thermal Control 80MM cooling fan.
> [2] Speeze CPU Fan Model 5F263B1M3 for AMD/Intel Socket A/370.
> --
> Eric De MUND
> ead at ixian.com
> 
> _______________________________________________
> conspire mailing list
> conspire at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/conspire




More information about the conspire mailing list