[conspire] Re: RHL 9 Install problems

Rick Moen rick at linuxmafia.com
Sun Jul 13 08:18:27 PDT 2003


Quoting Greg Dougherty (rhl at molecularsoftware.com):

> Ok, I ran e2fsck on my 26 gig partition.  It took about 20 hours.  It
> did the non-destructive read-write test, reported "vdone", and has now
> appeared to hang.  [...]

Just to let the other shoe drop, for the benefit of list readers, on
your long-running woes:  Greg was at Saturday's CABAL meeting with his
machine, and we ran through a series of installation attempts using my
RH9 CDs, getting SIG11 errors and other failures at numerous points,
usually RAM-intensive operations such as the RH installer handling
packages, or kernel compiles.  The tests applied to the machine included
running memtest86 from an LNX-BBC disk for something like an hour with
both RAM sticks in place, which found no problem with the RAM.

Eventually, we tried sundry operations with just one RAM stick at a
time, and problems showed up consistently with one stick installed, but
never with the other installed.  This seemed definitive:  It's a bad
stick of RAM.

Some points worth noting:  (1) I mentioned that you really need to run
memtest86 overnight to have a decent chance of catching RAM defects with
it.  Although it'll show up really gross defects right away, others will
slip right past a 1-hour check.  

(2) The fact that MS-Windows seemed to work fine on a RAM stick that
Linux seems to have problems with doesn't tell you anything about the
suspect RAM.  In fact, RAM-intensive operations under MS-Windows on such
a configuration were probably silently corrupting data on the fly;
MS-Windows just was showing no sign of the process.  

(3) As Heather Stern pointed out at the meeting, it's understandable
that some RAM defects slip past memtest86, because, much as it tries to
stress the RAM in order to test it, the utility does so using RAM-to-RAM
operations only, not other access modes at the same time such as DMA
(direct memory access) hardware calls to move data between disk and RAM.
But there's a traditional torture-test that does exactly that:  kernel
compiles -- which in fact _is_ what most unambiguously pointed out for
us where your problem was coming from.

I'm very glad we were able to reach a definitive diagnosis.  I'll bet
you're glad to be done with it.

-- 
Cheers,             "Don't use Outlook.  Outlook is really just a security
Rick Moen            hole with a small e-mail client attached to it."
rick at linuxmafia.com                        -- Brian Trosko in r.a.sf.w.r-j



More information about the conspire mailing list