[conspire] debugging (and initramfs, and ...)

Sat Dec 9 04:56:29 PST 2017

[okay, so I'm behind in my conspire reading 8-O]

So ... "boot" debugging ...
For most linux distributions, between boot loader (typically grub)
and init (whatever the proper init PID 1 process is once things are
well on their way to boot on the "real" root filesystem 'n all that),
there generally is intramfs.  I recently had occasion to do a wee bit
of diagnostics/debugging on that.  There are some good
resources/information on that.  I think I started around (from web
search) here:
https://wiki.debian.org/InitramfsDebug
(and even updated that a teensy bit)
and continued on to some additional resources/information
(I think man pages and/or ... peeking again, ah yes ...)
initramfs-tools(8), etc.  So, ... "too early" to have the stuff
logged to system log files?  Uhm, ... well, ... not quite :-)
There are ways to have the initramfs part of the boot process
(which typically comes in between boot loader - e.g. grub,
and when things are handed off to the "real" init process on the
"real" root filesystem and with the "real" root filesystem mounted)
... it can do some logging, notably to some stuff which apparently can
be left under /run and persist through boot - but not across (re)boots.
Particulars on methods, location, behaviors, etc. may vary depending
upon your distribution and vintage thereof, but that's at least the
general non-ancient Debian behavior, and I'd think most Debian derivatives
would do likewise (and more Linux distributions are Debian or derived from
Debian than any other distribution).  Many other distributions also
similarly use grub --> initramfs --> "real" init, so may be somewhat
to rather similar, though the particulars may vary (and perhaps
quite significantly).

And the glitch I was solving?  Turned out the
initramfs filesystem/environment wasn't "rich" enough - was missing
some critical stuff to be able to get everything lined up for the
"real" root filesystem.  Worked perfectly fine if I booted from, e.g.
USB into "rescue" environment - that had everything needed to get to
the "real" root.  Checking further on what was in the initramfs, and
how it was constructed, for my newer, slightly more complex environment
(had long been using LUKS & LVM, recently also added an mdadm RAID
layer), the fix was fairly simple ... uncomment one line in one
config file, create/update the initramfs again, then the LUKS related
stuff was again in the initramfs, and all was relatively well again
(could at least fully boot direct from the drive again ... probably
still want to do some tweak(s) to make it an easier more efficient
process ... but at least it works highly reliably now).

So, anyway, although boot loader (e.g. grub) may not offer anything
in the way of logging, once things are handed over (typically) to
initramfs part of the boot process, there may be some fairly
nice handy options for troubleshooting, logging, etc.  Well, at
least if one makes it some reasonable bit into the initramfs
stage of the boot process.

And ... systemd 8-O ... if one is using systemd there's various
documentation and sets of procedures for troubleshooting that,
notably including if it fails somewhere in the boot process.
For better (and/)or worse, I'm using systemd on all but one of
my Debian systems - I'll probably try systemd on that one host
again at some point, but last I did, trying to get it to work with
systemd was just burning too much time/resource, and I didn't want
to continue on that time suck - so I continued with what I'd
been using before - and that worked perfectly fine.  Since that
system went through fairly major (Debian 8.9 --> 9.2) upgrade
quite recently, I may give systemd another shot - to see if I can
get it to behave reasonably on that host without too much time
suckage to get it to do so - and without too much systemd
problematic or odd stuff such that it's a problem ... but
trying systemd again on that host not exactly a high priority for
me.  Meanwhile ...
egad, systemd - another Debian host, running systemd ... yeah, systemd
often radically violates principle of least surprise.  Egad. ...
trying to mount something else as /boot, it's got this friggin'
crazy thingy that automagically mounts /boot (if it's separate filesystem,
and if it can), and, egad, mount something else there and it basically
behaves: "Nope, not that one - umounting" ... WTF!  Yeah, had to
at least temporarily disable some boot something another unit on systemd
that was responsible for that craziness.  And, yeah, it didn't even
care what was in /etc/fstab ... maybe it read that earlier, but once
it's mind was made up.  <sigh>  Anyway, after smacking systemd with
a clue-by-four about /boot, and properly squaring away /boot,
after that all was "fine" again with systemd on that host.
Well ... at least Debian does make systemd somewhat more sane ...
more reasonably separate things out as feasible, etc.
E.g. many distributions (such as Fedora and derivatives) have long
since given up on allowing /usr to be a separate filesystem.
Debian still fully allows and supports /usr as separate filesystem (yeah!).
However ... if one is using systemd, /usr must be mountable and mounted
for the system to fully transition even up to init levels 1 or s (single
user mode).

And yes, for "server" class systems/environments (and more capable /
experienced sysadmins, and not tiny embedded/container environments),
generally better to have separate filesystems,
e.g. /home, /var, etc. (but not /etc).  Though for, e.g. "newbies"
I'll often recommend just one big root (/), or /boot + root (/)
if the distribution defaults to using LVM.  Not (generally) as
manageable/flexible that way, but much simpler, and also being
the default for the distribution, is often easy(/ier) for the
newbie to follow documentation, semi-random web suggestions, etc.
that often are based upon or presume such default(s).

So, e.g. my personal laptop, I've notably got filesystems:
/, /boot, /home, /usr, /var, ... plus quite a few others,
but that's "just" the key non-volatile local storage ones.
Some also advocate that /var/log and/or /var/tmp be separate
filesystem(s) - notably to protect space in /var/log (or /var
more generally) from whatever might go on in /var/tmp.
There's also /tmp as separate filesystem - using tmpfs :-)
And swap, I also do under LVM ... notably *much* easier to
add more or remove or whatever - none of the hassles of mucking
with partitions and reboots to make adjustments to swap.
Interesting bit: tmpfs is one of the few filesystem types that
can be reduced in size while it's mounted.  Sometimes when I want
large performance efficient temporary filesystem space, I'll
temporarily add more swap, grow /tmp, and when I'm done, shrink
/tmp back to its nominal size, and remove the temporarily added
extra swap.
And for many(/most?) distributions, hibernate (suspend to disk)
defaults to using swap, and if it properly works, it also works
just fine if swap is under LVM (heck, or even LVM atop LUKS!  :-)) -
yes, I was quite pleasantly surprised when I found that works* - wasn't
expecting that it would.
*works ... notwithstanding an unrelated issue I have with that -
video blanks and never comes back on resume - regardless of where
swap / resume storage is ... maybe some day I'll fix that, but not
high on the priorities list.

> Message: 1
> Date: Sat, 11 Feb 2017 19:40:57 +0000 (UTC)
> From: Paul Zander <paulz at ieee.org>
> To: Conspire List <conspire at linuxmafia.com>
> Subject: [conspire] debugging
> Message-ID: <1019574257.3572985.1486842057529 at mail.yahoo.com>
> Content-Type: text/plain; charset=UTF-8
>
> I am running a desktop with Debian Testing and did an upgrade about  
> 2 weeks ago.  Recently I have encountered a couple of "glitches"  
> during boot up.  I can bring the various log files to cabal to  
> examine, but which files?
>
> Glitch #1.   After grub, there is a message about "No Symbol Table".  
>  I was going to manually record the exact words, but the warning  
> timed out and proceeded.  I have successfully rebooted numerous times.
>
> Glitch #2.   After running a week or more, I decided to re-boot.   
> The PC is dual-boot and there was something I couldn't do with Wine.  
>  I made a point of closing all of the applications and windows  
> before shutdown.  Later, during the re-boot of Debian, the boot  
> screen froze with just three lines of text.  Lots of disk access,  
> but several minutes on nothing on the screen.  I turned off power.   
> This morning

> Message: 4
> Date: Sat, 11 Feb 2017 15:54:00 -0800
> From: Rick Moen <rick at linuxmafia.com>
> To: conspire at linuxmafia.com
> Subject: Re: [conspire] debugging
> Message-ID: <20170211235400.GJ609 at linuxmafia.com>
> Content-Type: text/plain; charset=utf-8
>
> Quoting Paul Zander (paulz at ieee.org):
>
>> So a very specific question, is there any log file I should capture
>> that be helpful for diagnoses?
>
> Alas, if you think about it, there's nothing capable of doing logging at
> that moment, unless the bootloader itself does logging, which is a lot
> to expect of a bootloader -- because nothing else is yet running.
>
> Until the OS kernel and an init process and the root filesystem are
> loaded, you cannot reasonably expect logging to commence, and that is
> when it does on all systems of my acquaintance.