[conspire] "all" (mostly) better now: guido: Re: more/remaining filesystem corruption on at least root (/) filesytem
Michael Paoli
Michael.Paoli at cal.berkeley.edu
Thu Dec 7 21:37:55 PST 2023
Rick (& on-list),
So ..., re:
On 2023-12-03 03:10, Michael Paoli wrote:
> Rick,
>
> Well, fortunately doesn't look too bad.
> But still best to avoid unnecessary/excessive writes to root(/)
> filesystem until such can be properly and safely dealt with. Only a
> handful* of files would appear to be impacted on root (/) filesystem,
> and not seeing evidence of any actual problems on any other
> filesystems.
> Details can be found:
> guido:/var/local/guido.filesystem.issues/
> most notably these files thereunder:
> root.fs.stuff_to_fix.txt
> OTHER.fs.info
> ... "of course" there's lots more detail there too under that
> directory.
>
> *:
> /etc/debian_version
> /etc/default/keyboard
> #170
> #1785
> #2097
>
> On 2023-12-02 22:02, Michael Paoli wrote:
http://linuxmafia.com/pipermail/conspire/2023-December/012541.html
Was planning to get to that this evening ...
but as things happened / turned out ... started bit sooner.
So, this morning, linuxmafia(.com) had an issue,
Rick rebooted guido, linuxmafia apparently didn't start,
Rick reported that from guido,
virsh start linuxmafia
threw an AppArmor error
and bit later reported that guido root (/) filesystem was mounted ro.
Well, that would explain the issue with attempted (re)start of
linuxmafia (I earlier discovered to do that, it would (re)create
relevant apparmor profile file(s) under /etc if they weren't already
present, and a file or two from there were among files impacted by root
(/) filesystem issue). So, no huge surprises there. As to why
linuxmafia.com went unresponsive, don't know, but it's been having some
occasional kernel Oops issue(s), and by the time I checked, if there was
any particular evidence as to why linuxmafia.com went unresponsive,
appears that evidence was no longer available.
So ... though I did at least partly catch Rick's communications on the
matter, I didn't catch the part about linuxmafia.com being down 'till
bit later (otherwise I would've hopped on it sooner). Anyway, after
I'd later also noticed that ...
Based on the earlier, in notable part:
# hostname && head -n 42 <
/var/local/guido.filesystem.issues/root.fs.stuff_to_fix.txt
guido
So, as / is still mounted rw, further problems may develop,
but from the 2023-12-03T06:19:18.104255697Z snapshot,
########################################################################
well, first the high level - details further below
/etc/debian_version
Looks like existing content:
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
...
doesn't much matter,
restore from base-files
/etc/default/keyboard
Looks like existing content:
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
...
doesn't much matter,
should be 0:0 644 with contents:
# KEYBOARD CONFIGURATION FILE
# Consult the keyboard(5) manual page.
XKBMODEL="pc105"
XKBLAYOUT="us"
XKBVARIANT=""
XKBOPTIONS=""
BACKSPACE="guess"
And as for:
#170
Looks like existing content:
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
...
doesn't much matter,
And:
#1785
lrwxrwxrwx 1 root root 30 Aug 12 15:58 #1785 ->
boot/initrd.img-6.1.0-11-amd64
looks like presently that should be the apparently missing:
/initrd.img.old
#2097
lrwxrwxrwx 1 root root 27 Aug 12 08:22 #2097 ->
boot/vmlinuz-6.1.0-11-amd64
looks redundant with:
lrwxrwxrwx 1 root root 27 Nov 22 20:55 /vmlinuz.old ->
boot/vmlinuz-6.1.0-11-amd64
########################################################################
#
So, I was guestimating root (/) filesystem was still in relatively
similar condition ... except now mounted ro. If it had still been
mounted rw as it had been before, probably would've used tune2fs(8) to
force it to fsck upon reboot, reboot, and then proceeded from there.
But as it was already remounted ro, I handled it a bit differently.
First I did
# fsck -f -n /dev/md5
Where /dev/md5 is the root filesystem (/) device,
notably to check that it didn't look too bad, and expecting it to look
quite like what I'd seen earlier ... and it did. That being the case,
and it also already mounted ro, I did:
# fsck -f -y /dev/md5
And that appearing to go very much as expected, then did:
# sync && sync && reboot -f -f
The syncs to ensure any changes from the fsck were flushed out to the
device, the -f -f options on reboot to do an immediate reboot - most
notably so it wouldn't possibly get suck and hang on some
/etc/rc*.d/K* stop
processing due to root (/) being mounted ro.
Under other circumstances might've also used the -n or --no-sync option,
but since root (/) filesystem was already mounted ro, that didn't
particularly matter (would be nothing to flush there), and probably
better without that option in this case, as that would do a sync on the
other filesystems (where it might matter, and would be better with
sync). Also, with linuxmafia already down, even less risk in doing
reboot -f -f
as compared to a more customary orderly shutdown.
And when it came back up, root (/) filesytem was clean, and
linuxmafia(.com) was back up and running again.
I looked over the root (/) filesystem, and things were exactly as
expected, based upon earlier work. So I restored
/etc/debian_version from base-files_12.4+deb12u2_amd64.deb,
/etc/default/keyboard from what I'd earlier noted,
/initrd.img.old from /lost+found/#1785
And checked the remaining /lost+found/#* files and their content which
were:
#170
#2097
And were as expected from the earlier, and unneeded, so got rid of
those.
More information about the conspire
mailing list