[sf-lug] Grub question

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sat Mar 24 22:40:01 PDT 2012

I'm presuming we're talking GRUB "2" (not "legacy" grub).
I believe the GRUB_DISABLE_RECOVERY option just prevents grub from
creating menu entries with recovery boot option (e.g. separate
additional entries for Linux kernel with additional boot parameter of
"single" for single user mode) - so that option is probably wouldn't
help in the case cited.

First question is - why - beyond the obvious repeated, and potentially
unpredictable, power disruptions, is it getting "stuck" - presumably in
the boot process.  A few possibilities come to mind - and some possible
ways of addressing them (even if power isn't addressed).

And by "maintenance shell" - is that Linux in, or prompting to go into
single user mode? ... and is that from filesystem(s) too unclean for the
default unattended filesystem check/fix/mounts to succeed?

Anyway, a few things ...

Be sure the default GRUB entry that's booted is
for multiuser mode.  Also, for quicker unattended reboots, make sure
the timeout is "short enough" (but not too short that one can't
reasonably easily interrupt it if/when one wants to).

Highly recommended, in such scenario (frequent and unpredictable power
outages to host), use filesystems with journaling - that will help both
with integrity/recoverability, and generally significantly speed

It may be desirable to separate out various filesystems, and to the
extent feasible, have filesystems mounted read-only most of the time.
E.g. /usr (per FHS <http://www.pathname.com/fhs/>, etc.) can be mounted
read-only most of the time.  That may thus speed boot/recovery time, by
having fewer/smaller filesystems that need checking after a system crash
(if filesystem was unmounted or mounted read-only when system crashed,
at reboot it's generally clean and requires no additional checking
before being mounted again).

For faster reboots, it may also be desirable to have the noauto option
set in /etc/fstab for filesystems that aren't essential for initial
running of multiuser mode (e.g. restoring remote access).  One can then
come up to initial multiuser mode, and further (e.g. rc) scripts can
then handle checking/mounting additional desired filesystems and
starting the applications that need those filesystems - one could
also potentially configure a separate runlevel for that.

For some filesystem types (e.g. ext2/ext3/ext4), one can tune (tune2fs)

various check parameters - that may help with the filesystems (not)
being checked too (in)frequently at boot/mount, and may potentially
reduce problems.

One might want to set configuration option(s) and/or filesystems to
force a complete check upon (re)boot.  That might take (possibly much)
longer to boot, but may keep filesystems in better shape, particularly
with repeated unpredictable host power outages (including also, e.g.,
additional power outages while filesystem checks are in progress, etc.).

Watchdog timers - it may be useful to enable watchdog timers - that
could potentially prevent some system hangs - not only more conventional
hangs/lockups, but if, e.g., power glitch puts hardware in funky state
and system hangs on stuck I/O ... if system is still "alive enough" that
watchdog timer works, that would then force a reset, from which things
would presumably (likely) then recover.

Some time of remote management could come in quite handy (e.g. IPMI).
One could then potentially remotely reset host, effectively access (at
least text) console, etc.

And if not already in place, monitoring would be good to know when some
issue needed to be attended to (whether it could be fixed remotely ...
or not).

And, ... when in doubt, test.  :-)

> From: conor.list at gmail.com
> Date: Sat Mar 24 16:37:48 PDT 2012
> /etc/grub/default.conf has a setting 'GRUB_DISABLE_RECOVERY' that  
> looks promising.
> On Mar 24, 2012, at 3:11 PM, Eric Walstad <eric at ericwalstad.com> wrote:
> > I have a linux box that functions as a remote weather station.  Long
> > story short, sometimes it gets powered off a few times in a row
> > without doing a proper shutdown.  I think that is what is causing Grub
> > to boot into a maintenance shell, which means I have to drive to the
> > remote site to attach a keyboard and monitor and type 'reboot' to get
> > it to go through the normal bootup.
> >
> > I'm working on the problem of ensuring a clean shutdown when power is
> > about to go away, but in the mean time, do any of you know how to keep
> > Grub from behaving this way?  I'd rather it always boot when power is
> > applied.

More information about the sf-lug mailing list