[conspire] solved: Re: systemd 8-O ; -) ... bind9 chroot Debian 9 (stretch) --> Debian 10 (buster)

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sat Apr 18 17:46:51 PDT 2020

Okay, got it solved.

Turned out to be relatively simple ... but I missed it on earlier passes.
The fix:
# rm /etc/systemd/system/bind9.service.d/bind9.conf
# systemctl daemon-reload

Bit 'o background,
the earlier:
was put in place - and necessary - to do certain overrides
for systemd launching bind9 ... notably for the chroot.
However, now the unit file for bind9
had changed how it was
calling / expecting to launch bind9.  Notably:
So, that was conflicting with how
was firing up bind9's named.
Also, the newer
picks up the configuration bits in
(looks like the older did too ... but probably
the much older did not)
which also has the customizations for chroot,
so the (custom)
was no longer needed at all.
Still not sure why it earlier didn't work
when I removed the -f option in
and I think I even tried removing that file entirely,
but I might've missed the
# systemctl daemon-reload
step earlier on some of those attempts.
In any case, cleanest "fix" for it:
# rm /etc/systemd/system/bind9.service.d/bind9.conf
# systemctl daemon-reload
(well ... notwithstanding totally gutting systemd ;-))
is also pretty good, but it could do with some more updating (which
I may likely get around to if someone doesn't beat me to it).

> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> Subject: systemd 8-O ;-) ... bind9 chroot Debian 9 (stretch) -->  
> Debian 10 (buster)
> Date: Sat, 18 Apr 2020 04:03:26 -0700

> So, ... hitting a systemd issue I'd like to figure out and get resolved.
> Yeah, I know, systemd, ugh ... but despite my also not much liking it,
> if reasonably feasible, want to see if I can get this issue resolved.
> So, bit 'o background:
> So, ... working on (near) clone (balugclone) of system (balug).
> Near?  As in starting about identical, then mostly changing "just
> enough" (
> clone:
>     different Ethernet MAC address
>     (before even first booting) down interface link:
>     (
>     link=down; mac=52:54:00:67:20:40
>     virsh domif-setlink balugclone "$mac" "$link" --config
>     virsh domif-setlink balugclone "$mac" "$link"
>     virsh domif-getlink balugclone "$mac" --config
>     virsh domif-getlink balugclone "$mac"
>     )
>     change network from bridged to default (RFC-1918 + NAT/SNAT)
>     stop and disable potential conflicting services:
>     systemctl stop & systemctl disable:
>     mailman.service
>     exim4.service
>     apache2.service
>     spamassassin.service
>     rsync.service
>     mariadb.service
>     bind9.service
>     ...
> )
> to avoid conflicts with the running production balug
> Virtual Machine (VM) and its data, etc.
> And, what for?  Do a pre-production Debian 9 (stretch) --> 10 (buster)
> upgrade, to be able to plan for and have (theoretically) smooth actual
> production upgrade.  Alas, last time around, wasn't quite thorough
> enough:
> https://lists.balug.org/pipermail/balug-admin/2020-February/001018.html
> Anyway, this time, fair bit more progress (yea!) (notably working
> through sanity checks of at least basic functionality of important services).
> But alas, still bumping into one gottcha I've not yet found a fix for.
> And, yup, systemd specific.
> So, nameserver - running BIND9 under chroot.
> If I fire it up manually, in manner that sysvinit would were it present:
> # PATH=/sbin:/bin:/usr/sbin:/usr/bin start-stop-daemon --start --oknodo \
>   --quiet --exec /usr/sbin/named --pidfile /run/named/named.pid -- \
>   -u bind -t /var/lib/named
> Then all appears fine, it runs fine, functions, keeps working, etc.
> (note to safely test it on clone, also:
> clone:
>     /etc/network/interfaces disable interfaces except lo and change eth0
>         to inet dhcp
>     (eth0 & relevant configs later becomes ens3 through the upgrade)
>     shutdown
>     up interface link:
>     (link=up; mac=52:54:00:67:20:40
>     virsh domif-setlink balugclone "$mac" "$link" --config
>     virsh domif-setlink balugclone "$mac" "$link"
>     virsh domif-getlink balugclone "$mac" --config
>     virsh domif-getlink balugclone "$mac"
>     )
>     boot
>     and before enabling and attempting to (re)start bind9:
>     bind9 all notify off (no)
>     comment out notify-source and notify-source-v6
> )
> But alas, when started under systemd with:
> # systemctl start bind9.service
> Things go kind'a funky ... and fail in fairly short order.
> First of all, as far as I can tell, from both systemd config,
> and also looking at process arguments and such, looks like bind9
> fires up properly under chroot in either case.
> From: /etc/systemd/system/bind9.service.d/bind9.conf
> we have:
> ExecStart=/usr/sbin/named -f -u bind -t /var/lib/named
> Also, without that -f option there (and after:
> # systemctl daemon-reload
> )
> it then effectively doesn't (as far as systemd/systemctl is concerned)
> work at all, failing quite immediately with:
> systemd[1]: bind9.service: Control process exited, code=exited,  
> status=1/FAILURE
> ... even though bind9/named is and continues to run fine in that case ...
> but the systemd/systemctl status is all wrong, as it thinks it failed,
> so, need the -f option.  Anyway, back to with -f (foreground) option:
> And ... smoking gun ... strace(1).
> It looks like in both cases (manual sysvinit-like start, or
> systemd:
> # systemctl start bind9.service
> named itself starts and
> runs fine ... it's actually a systemd (configuration?) problem!
> And, how did I find that?  When the named process fails, it fails
> because it's getting SIGTERM!!!:
> 4539  --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
> This seems to consistently happen about 90 seconds after systemd/systemctl
> "starts" (attempts to start) it.
> And ...:
> 4689  kill(4690, SIGTERM)               = 0
> (the only reason the two PIDs between that and the earlier above don't
> match, is they were captured in separate runs).
> It's systemd/systemctl that's sending the signal that's causing
> bind9 (named) to shutdown - that's also 100% consistent with what the
> logs shows, e.g. (shortening the timestamps to MM:SS):
> 51:42 balug-sf-lug-v2 named[5518]: resolver priming query complete
> 53:12 balug-sf-lug-v2 named[5518]: shutting down
> 53:12 balug-sf-lug-v2 named[5518]: stopping command channel on
> 53:12 balug-sf-lug-v2 named[5518]: stopping command channel on ::1#953
> 53:12 balug-sf-lug-v2 named[5518]: no longer listening on ::#53
> 53:12 balug-sf-lug-v2 named[5518]: no longer listening on
> 53:12 balug-sf-lug-v2 named[5518]: no longer listening on
> 53:12 balug-sf-lug-v2 named[5518]: exiting
> So ... at this point I'm trying to figure out why systemd/systemctl
> is SIGTERMing named - when it ought not.  I'm guestimating maybe
> it tries to do some "health check", and does it improperly, and after
> 90 seconds "gives up" and SIGTERMs the PID.
> I also notice:
> # systemctl start bind9.service
> ... if done from terminal, that remains in the foreground the entire time
> So seems systemd/systemctl is "waiting" for some check to pass before
> "releasing", and instead times out waiting, gives up, and zaps the PID.
> So, curious if any folks might know or have more clue(s) as to what
> to look at and/or where to get down to the bottom of this
> systemd/systemctl issue with bind9/named (also not seeing this issue
> with any of the other services).
> Other interesting bit ... (maybe just distraction / red herring):
> /bin/systemd-tty-ask-password-agent
> systemd/systemctl, done with interactive start from terminal,
> fires up (forks (clone) and execs /bin/systemd-tty-ask-password-agent
> with argument of --wait).  If I redirect stdin from /dev/null,
> e.g.:
> # </dev/null systemctl start bind9.service
> I don't end up with the /bin/systemd-tty-ask-password-agent process
> hanging out for the duration ... but even in that case, named still
> gets SIGTERMed by systemd/systemctl right around 90 seconds after it's
> been fired up.
> Also, on details, systemd/systemctl sends SIGCONT immediately
> before the SIGTERM ... but it's the SIGTERM that has everything going
> sideways and TERMinates the running bind9/named.
> Also, if folks are curious, here are some of the key bits
> that allow bind9/named to function under chroot:
> $ grep named.\*bind /etc/fstab
> /dev/null /var/lib/named/dev/null none bind 0 0
> /dev/random /var/lib/named/dev/random none bind 0 0
> /run/named /var/lib/named/run/named none bind 0 0
> /usr/share/dns /var/lib/named/usr/share/dns none bind 0 0
> $
> That, and some symlink(s), etc., and it works under chroot ...
> and stuff that needs and ought interact with it, from outside of
> chroot, all works and plays nice together (almost the same as
> Debian 9 (stretch) ... just one more directory from /usr for
> Debian 10 (buster)).  And with that infrastructure, it probably also
> runs just fine outside of chroot too, without any changes ... but I
> really don't want to be running it outside of the chroot.
> Ah, what the heck, it's non-production, let's try ...
> /etc/systemd/system/bind9.service.d/bind9.conf
> ExecStart=/usr/sbin/named -f -u bind
> # systemctl daemon-reload
> # systemctl start bind9.service
> ... and still fails same way (again shortening the timestamps to MM:SS):
> 11:19 balug-sf-lug-v2 named[5991]: resolver priming query complete
> 12:49 balug-sf-lug-v2 named[5991]: shutting down
> 12:49 balug-sf-lug-v2 named[5991]: stopping command channel on
> 12:49 balug-sf-lug-v2 named[5991]: stopping command channel on ::1#953
> 12:49 balug-sf-lug-v2 named[5991]: no longer listening on ::#53
> 12:49 balug-sf-lug-v2 named[5991]: no longer listening on
> 12:49 balug-sf-lug-v2 named[5991]: no longer listening on
> 12:49 balug-sf-lug-v2 named[5991]: exiting
> And if I do it sysvinit-like start, without chroot:
> # PATH=/sbin:/bin:/usr/sbin:/usr/bin start-stop-daemon --start  
> --oknodo --quiet --exec /usr/sbin/named --pidfile  
> /run/named/named.pid -- -u bind
> ... it continues to stay up and running no problem, long past 90 seconds,
> so appears it's not only not a chroot issue, but not even at all specific
> to chroot.
> FYI:
> $ ls -l /etc/bind
> lrwxrwxrwx 1 root root 25 Mar 15  2014 /etc/bind -> ../var/lib/named/etc/bind
> $
> Anyway, mostly that, and the bind mounts, and appropriate
> permissions/ownerships, and it plays well in and/or out of chroot (alas,
> probably the first time I fired it up outside of chroot in many years).

More information about the conspire mailing list