[conspire] Waah, my little server crashed...

Rick Moen rick at linuxmafia.com
Sat Nov 4 13:45:44 PST 2006

Quoting Ed Biow (biow at sbcglobal.net):

> I have a little Debian Sarge machine that I generally leave on all the 
> time, a $107.00 VIA Samuel 2 Asus Terminator jobbie that does yeoman 
> work as my local http, ftp and file server, plus light desktop duties. 
> (It is a bit pokey despite 512 MB of SDRAM, so it isn't my preferred 
> workstation).  But it is handy and very reliable, and hopefully goes 
> easy on the juice.  This morning I tried to access it from another box 
> and it wasn't responding, so I went downstairs and, lo and behold it was 
> off.

Well, that's a real poser, because normally _software_ problems would
not cause the machine to shutdown and power off.  They might make the
machine hang with a kernel panic message, or have critical processes
segfault, or just seize up and give no indication of what's wrong, or
reboot -- but all of those fault outcomes would tend to leave the
machine verifiably powered up although not necessarily "running" in the
functional sense.

So, I'm concluding that it's pretty definitively a hardware problem. 
Of course, it could have been a one-time thing.

> Anyway, I'm trying to figure out why it shut down, whether it is a 
> failing component or a OS glitch or just a momentary lapse of power.
> I figure the first place to look is /var/log, but I really don't know 
> where to look.

Indeed, /var/log/messages often doesn't have a lot other than time marks
in it.  syslog is worth skimming just to be thorough, maybe daemon.log,
dmesg, kern.log.  However, don't be surprised if the root cause simply
wasn't visible to your operating system and software, because it's at a
hardware level that's not software-visible.  For example, Deirdre just
mentioned to me that this could easily be a sign of a weak or failing
power supply unit (PSU).

> Since the system is on all the time I'm thinking maybe the drive is 
> beginning to have problems, so I'd like to check drive integrity.
> Should I check the hard drive surface using the proprietary utility that 
> came with my disk?  Of should I reboot to a live CD and run something like:
> fsck -t ext3 /dev/hdaX

It's always good to know how to check hard disks.  "fsck -c" (which not
only runs the badblocks utility, but also makes sure that any bad blocks
found are mapped out and not used prospectively).  "Hard Drive Utilties" 
on http://linuxmafia.com/kb/Hardware has hyperlinks to all of the
manufacturers' HD-diagnosis utilities for their models, and those are
worth knowing about, as well.  In addition, the smartmontools can listen
in on your HD's internal self-checking routines, and help track HD
health and predict failure.

However, none of those are very likely to be relevant to your problem,
because I just cannot easily conceive of a hard drive problem that would
cause the machine to power off.  HD issues tend to have completely
different sorts of symptoms.

> Maybe I should complement that with a nice couple of hours round of 
> memtest, as well.

Again, you could reasonably let memtest86 run overnight from, say, a
Knoppix live CD, but I really doubt that memory problems are your root
cause.  Memory problems can cause random reboots, segfaults, SIG11
errors, silent data corruption, or other runtime weirdness.  Memory
problems can even cause the machine to not power on, or produce no
video, when you hit the power switch.  However, I'm not aware of a RAM 
problem that would cause the machine to power off.

> Or would the path of prudence be to just back up my data and hope it 
> doesn't happen again?

1.  Backing up your data is good on its independent merits.  2.  If it
happened once and never again, then just blame sunspots or a Disturbance
in the Force and worry about global warming, instead.  

Remember, once is accident.  Twice is coincidence.  Three times is enemy
action:  You might have a new motherboard or PSU in your future.  Or not.
But don't rush out and start buying new parts until you have more to go

(I could be talking out /dev/ass, so use your own best judgement, not to
mention your eyes and ears, which are often your best diagnostic tools.)

More information about the conspire mailing list