[conspire] Crashing Problems, Suspect IRQ Mismatch, Maybe BIOS

Daniel Gimpelevich daniel at gimpelevich.san-francisco.ca.us
Wed May 23 11:12:15 PDT 2007


On Tue, 22 May 2007 19:04:39 -0700, Tim Utschig wrote:

> On Tue, May 22, 2007 at 03:52:17PM -0700, mark at weisler-saratoga-ca.us wrote:
>> I'm having problems with a box I just assembled and would appreciate
>> help if someone has time to consider this. Very briefly, the mouse and
>> keyboard freeze after about one to five minutes of operation.
>> Ctl-Alt-Delete is not recognized and a hard reboot is required.
>> Sometimes, as it fails, a faint blue stripe appears across the screen.
> 
> If you have a null modem cable and another machine with a serial port
> available, you could try booting with the kernel outputting to the
> serial console to try to catch any kernel messages related to the
> lock-up.
> 
> For example, append " console=ttyS0,57600" to the kernel command line
> via grub/lilo, and have minicom configured (Ctrl+A, P) to match ("57600
> 8N1") and to capture to a file (Ctrl+A, L) on the machine at the other
> end of the cable.
> 
> Also you might try, if possible, to pull a memory module or two and see
> if the problem disappears.
> 
> Oh, and you're not using a proprietary ATI driver by any chance, are
> you?

The problem with using a serial console to catch a freezing problem like
this is that the serial communication tends to be buffered, so the
all-important last line or two of the messages might never arrive. When
doing so anyway, capturing to a file is by no means a requirement, since
you'll see the messages on the screen of the other machine directly.

The fglrx driver is notorious for causing crashes exactly as you describe,
but the dmesg in your pastebin clearly shows you're using vesa, so that
can't be it in your case.

Even after the pastebin, I haven't seen sufficient data on your box to
suspect an IRQ problem, but assuming you've seen stuff I haven't in the
considerable amount of time you've spent on it, there are certainly things
to try.

In some combinations of things tried, the BIOS is involved in assigning
IRQs, so if there were eight revisions after the BIOS you're using, an
update may affect things. NEVER, EVER use Windows to accomplish this!
Although it is possible to flash a BIOS from within a booted Linux, it's
often not particularly convenient, and probably a bad idea when there's a
possibility of a freeze-up. A FreeDOS boot works well, as long as you
choose the option with no drivers. If that boot is from a CD, it's best to
have the updater already downloaded to the only primary FAT partition on a
hard disk. You do have only one of those, right?

Your current kernel arguments include:
nosound noapic noscsi nodma noapm nousb nopcmcia nofirewire noagp nomce
nodhcp nodbus nocpufreq nobluetooth

Get rid of all that junk in there. Unless you know exactly what you're
disabling and why, you may be introducing just as many unknowns as when
support for some specific hardware isn't working.

After you have done that, try the following kernel arguments in turn to
see how the IRQs are affected:
pci=bios,acpi
pci=bios,noacpi
pci=nobios,acpi
pci=nobios,noacpi

Once the IRQs look right, you may want to fix that timing thing in dmesg,
as it may cause the clock to go screwy. The "clock=pit" kernel argument
typically does that satisfactorily.

If you turn out to be wrong about the IRQ problems causing the freeze-ups,
I would investigate temperature factors. You'll need to install
"lm-sensors" and "xsensors" IIRC, then run "sensors-detect" and "xsensors"
to check.





More information about the conspire mailing list