[conspire] solved: Re: systemd 8-O ; -) ... bind9 chroot Debian 9 (stretch) --> Debian 10 (buster)

Texx texxgadget at gmail.com
Tue Apr 21 17:35:58 PDT 2020


In full on snark mode Ill comment that this must be why Rick was going off
on the whole systemd thing.

Seriously, I found your post interesting and educational.

I have only begun to encounter systemd, so any posts on it are helpful.
Once you get the hand of it, I must admit that with several systems
combined into systemd with a single way to manage them,
it has the potential for simplification.


On Sat, Apr 18, 2020 at 5:49 PM Michael Paoli <
Michael.Paoli at cal.berkeley.edu> wrote:

> Okay, got it solved.
>
> Turned out to be relatively simple ... but I missed it on earlier passes.
> The fix:
> # rm /etc/systemd/system/bind9.service.d/bind9.conf
> # systemctl daemon-reload
>
> Bit 'o background,
> the earlier:
> /etc/systemd/system/bind9.service.d/bind9.conf
> was put in place - and necessary - to do certain overrides
> for systemd launching bind9 ... notably for the chroot.
> However, now the unit file for bind9
> /lib/systemd/system/bind9.service
> had changed how it was
> calling / expecting to launch bind9.  Notably:
> [Service]
> Type=forking
> So, that was conflicting with how
> /etc/systemd/system/bind9.service.d/bind9.conf
> was firing up bind9's named.
> Also, the newer
> /lib/systemd/system/bind9.service
> picks up the configuration bits in
> /etc/default/bind9
> (looks like the older did too ... but probably
> the much older did not)
> which also has the customizations for chroot,
> so the (custom)
> /etc/systemd/system/bind9.service.d/bind9.conf
> was no longer needed at all.
> Still not sure why it earlier didn't work
> when I removed the -f option in
> /etc/systemd/system/bind9.service.d/bind9.conf
> and I think I even tried removing that file entirely,
> but I might've missed the
> # systemctl daemon-reload
> step earlier on some of those attempts.
> In any case, cleanest "fix" for it:
> # rm /etc/systemd/system/bind9.service.d/bind9.conf
> # systemctl daemon-reload
> (well ... notwithstanding totally gutting systemd ;-))
> https://wiki.debian.org/Bind9
> is also pretty good, but it could do with some more updating (which
> I may likely get around to if someone doesn't beat me to it).
>
> > From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> > Subject: systemd 8-O ;-) ... bind9 chroot Debian 9 (stretch) -->
> > Debian 10 (buster)
> > Date: Sat, 18 Apr 2020 04:03:26 -0700
>
> > So, ... hitting a systemd issue I'd like to figure out and get resolved.
> > Yeah, I know, systemd, ugh ... but despite my also not much liking it,
> > if reasonably feasible, want to see if I can get this issue resolved.
> > So, bit 'o background:
> >
> > So, ... working on (near) clone (balugclone) of system (balug).
> > Near?  As in starting about identical, then mostly changing "just
> > enough" (
> > clone:
> >     different Ethernet MAC address
> >     (before even first booting) down interface link:
> >     (
> >     link=down; mac=52:54:00:67:20:40
> >     virsh domif-setlink balugclone "$mac" "$link" --config
> >     virsh domif-setlink balugclone "$mac" "$link"
> >     virsh domif-getlink balugclone "$mac" --config
> >     virsh domif-getlink balugclone "$mac"
> >     )
> >     change network from bridged to default (RFC-1918 + NAT/SNAT)
> >     stop and disable potential conflicting services:
> >     systemctl stop & systemctl disable:
> >     mailman.service
> >     exim4.service
> >     apache2.service
> >     spamassassin.service
> >     rsync.service
> >     mariadb.service
> >     bind9.service
> >     ...
> > )
> > to avoid conflicts with the running production balug
> > Virtual Machine (VM) and its data, etc.
> > And, what for?  Do a pre-production Debian 9 (stretch) --> 10 (buster)
> > upgrade, to be able to plan for and have (theoretically) smooth actual
> > production upgrade.  Alas, last time around, wasn't quite thorough
> > enough:
> > https://lists.balug.org/pipermail/balug-admin/2020-February/001018.html
> >
> > Anyway, this time, fair bit more progress (yea!) (notably working
> > through sanity checks of at least basic functionality of important
> services).
> >
> > But alas, still bumping into one gottcha I've not yet found a fix for.
> > And, yup, systemd specific.
> >
> > So, nameserver - running BIND9 under chroot.
> > If I fire it up manually, in manner that sysvinit would were it present:
> > # PATH=/sbin:/bin:/usr/sbin:/usr/bin start-stop-daemon --start --oknodo \
> >   --quiet --exec /usr/sbin/named --pidfile /run/named/named.pid -- \
> >   -u bind -t /var/lib/named
> > Then all appears fine, it runs fine, functions, keeps working, etc.
> > (note to safely test it on clone, also:
> > clone:
> >     /etc/network/interfaces disable interfaces except lo and change eth0
> >         to inet dhcp
> >     (eth0 & relevant configs later becomes ens3 through the upgrade)
> >     shutdown
> >     up interface link:
> >     (link=up; mac=52:54:00:67:20:40
> >     virsh domif-setlink balugclone "$mac" "$link" --config
> >     virsh domif-setlink balugclone "$mac" "$link"
> >     virsh domif-getlink balugclone "$mac" --config
> >     virsh domif-getlink balugclone "$mac"
> >     )
> >     boot
> >     and before enabling and attempting to (re)start bind9:
> >     bind9 all notify off (no)
> >     comment out notify-source and notify-source-v6
> > )
> >
> > But alas, when started under systemd with:
> > # systemctl start bind9.service
> > Things go kind'a funky ... and fail in fairly short order.
> > First of all, as far as I can tell, from both systemd config,
> > and also looking at process arguments and such, looks like bind9
> > fires up properly under chroot in either case.
> > From: /etc/systemd/system/bind9.service.d/bind9.conf
> > we have:
> > ExecStart=/usr/sbin/named -f -u bind -t /var/lib/named
> >
> > Also, without that -f option there (and after:
> > # systemctl daemon-reload
> > )
> > it then effectively doesn't (as far as systemd/systemctl is concerned)
> > work at all, failing quite immediately with:
> > systemd[1]: bind9.service: Control process exited, code=exited,
> > status=1/FAILURE
> > ... even though bind9/named is and continues to run fine in that case ...
> > but the systemd/systemctl status is all wrong, as it thinks it failed,
> > so, need the -f option.  Anyway, back to with -f (foreground) option:
> >
> > And ... smoking gun ... strace(1).
> > It looks like in both cases (manual sysvinit-like start, or
> > systemd:
> > # systemctl start bind9.service
> > named itself starts and
> > runs fine ... it's actually a systemd (configuration?) problem!
> > And, how did I find that?  When the named process fails, it fails
> > because it's getting SIGTERM!!!:
> > 4539  --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1,
> si_uid=0} ---
> > This seems to consistently happen about 90 seconds after
> systemd/systemctl
> > "starts" (attempts to start) it.
> > And ...:
> > 4689  kill(4690, SIGTERM)               = 0
> > (the only reason the two PIDs between that and the earlier above don't
> > match, is they were captured in separate runs).
> > It's systemd/systemctl that's sending the signal that's causing
> > bind9 (named) to shutdown - that's also 100% consistent with what the
> > logs shows, e.g. (shortening the timestamps to MM:SS):
> > 51:42 balug-sf-lug-v2 named[5518]: resolver priming query complete
> > 53:12 balug-sf-lug-v2 named[5518]: shutting down
> > 53:12 balug-sf-lug-v2 named[5518]: stopping command channel on
> 127.0.0.1#953
> > 53:12 balug-sf-lug-v2 named[5518]: stopping command channel on ::1#953
> > 53:12 balug-sf-lug-v2 named[5518]: no longer listening on ::#53
> > 53:12 balug-sf-lug-v2 named[5518]: no longer listening on 127.0.0.1#53
> > 53:12 balug-sf-lug-v2 named[5518]: no longer listening on
> 192.168.122.245#53
> > 53:12 balug-sf-lug-v2 named[5518]: exiting
> > So ... at this point I'm trying to figure out why systemd/systemctl
> > is SIGTERMing named - when it ought not.  I'm guestimating maybe
> > it tries to do some "health check", and does it improperly, and after
> > 90 seconds "gives up" and SIGTERMs the PID.
> > I also notice:
> > # systemctl start bind9.service
> > ... if done from terminal, that remains in the foreground the entire time
> > So seems systemd/systemctl is "waiting" for some check to pass before
> > "releasing", and instead times out waiting, gives up, and zaps the PID.
> >
> > So, curious if any folks might know or have more clue(s) as to what
> > to look at and/or where to get down to the bottom of this
> > systemd/systemctl issue with bind9/named (also not seeing this issue
> > with any of the other services).
> >
> >
> > Other interesting bit ... (maybe just distraction / red herring):
> > /bin/systemd-tty-ask-password-agent
> > systemd/systemctl, done with interactive start from terminal,
> > fires up (forks (clone) and execs /bin/systemd-tty-ask-password-agent
> > with argument of --wait).  If I redirect stdin from /dev/null,
> > e.g.:
> > # </dev/null systemctl start bind9.service
> > I don't end up with the /bin/systemd-tty-ask-password-agent process
> > hanging out for the duration ... but even in that case, named still
> > gets SIGTERMed by systemd/systemctl right around 90 seconds after it's
> > been fired up.
> > Also, on details, systemd/systemctl sends SIGCONT immediately
> > before the SIGTERM ... but it's the SIGTERM that has everything going
> > sideways and TERMinates the running bind9/named.
> >
> > Also, if folks are curious, here are some of the key bits
> > that allow bind9/named to function under chroot:
> > $ grep named.\*bind /etc/fstab
> > /dev/null /var/lib/named/dev/null none bind 0 0
> > /dev/random /var/lib/named/dev/random none bind 0 0
> > /run/named /var/lib/named/run/named none bind 0 0
> > /usr/share/dns /var/lib/named/usr/share/dns none bind 0 0
> > $
> > That, and some symlink(s), etc., and it works under chroot ...
> > and stuff that needs and ought interact with it, from outside of
> > chroot, all works and plays nice together (almost the same as
> > Debian 9 (stretch) ... just one more directory from /usr for
> > Debian 10 (buster)).  And with that infrastructure, it probably also
> > runs just fine outside of chroot too, without any changes ... but I
> > really don't want to be running it outside of the chroot.
> > Ah, what the heck, it's non-production, let's try ...
> > /etc/systemd/system/bind9.service.d/bind9.conf
> > ExecStart=/usr/sbin/named -f -u bind
> > # systemctl daemon-reload
> > # systemctl start bind9.service
> > ... and still fails same way (again shortening the timestamps to MM:SS):
> > 11:19 balug-sf-lug-v2 named[5991]: resolver priming query complete
> > 12:49 balug-sf-lug-v2 named[5991]: shutting down
> > 12:49 balug-sf-lug-v2 named[5991]: stopping command channel on
> 127.0.0.1#953
> > 12:49 balug-sf-lug-v2 named[5991]: stopping command channel on ::1#953
> > 12:49 balug-sf-lug-v2 named[5991]: no longer listening on ::#53
> > 12:49 balug-sf-lug-v2 named[5991]: no longer listening on 127.0.0.1#53
> > 12:49 balug-sf-lug-v2 named[5991]: no longer listening on
> 192.168.122.245#53
> > 12:49 balug-sf-lug-v2 named[5991]: exiting
> > And if I do it sysvinit-like start, without chroot:
> > # PATH=/sbin:/bin:/usr/sbin:/usr/bin start-stop-daemon --start
> > --oknodo --quiet --exec /usr/sbin/named --pidfile
> > /run/named/named.pid -- -u bind
> > ... it continues to stay up and running no problem, long past 90 seconds,
> > so appears it's not only not a chroot issue, but not even at all specific
> > to chroot.
> > FYI:
> > $ ls -l /etc/bind
> > lrwxrwxrwx 1 root root 25 Mar 15  2014 /etc/bind ->
> ../var/lib/named/etc/bind
> > $
> > Anyway, mostly that, and the bind mounts, and appropriate
> > permissions/ownerships, and it plays well in and/or out of chroot (alas,
> > probably the first time I fired it up outside of chroot in many years).
>
>
> _______________________________________________
> conspire mailing list
> conspire at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/conspire
>


-- 

R "Texx" Woodworth
Sysadmin, E-Postmaster, IT Molewhacker
"Face down, 9 edge 1st, roadkill on the information superdata highway..."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://linuxmafia.com/pipermail/conspire/attachments/20200421/fa1fb2f1/attachment-0001.html>


More information about the conspire mailing list