[conspire] Other notes from the Debian 5.0.1/Lenny to 6.0/Squeeze upgrade

Rick Moen rick at linuxmafia.com
Tue Aug 24 17:59:58 PDT 2010


Oh, and one last thing:  I should explain this bit.

> An earlier judgement error (early 2010) had left my system partly
> on Debian-stable and partly on Debian-testing, which is a really bad
> idea.  A couple of things were unhappy, but I'd not carried out the
> steps to forward-revision everything to Debian-testing out of a fear
> that there would be considerable breakage and require me to do a
> marathon of emergency system rebuilding.

What possessed me to do such as stupid thing?

It goes back to what happened in April 2008.  Shortly before I 
was due to go into the hospital for surgery, a spring lightning storm 
fried my server, y'all may recall.  My 1998-era VA Research Corp. model
500 machine was quite toasty, and really nothing was salvageable.  My 
best and most recent backup was files I'd rsync'd to Deirdre's
Solaris-based virtual-host-of-sorts at an ISP.  So, just after rapidly
building a new Debian-stable system on the current VA Linux model 2230
hardware, I rsync'd the files back -- and discovered to my dismay that
Solaris had munged all the file ownerships, because the rsync backup had
not been conducted as the root user.

I fixed everything that I could, and deployed the almost 100% rebuilt
machine a few hours after the old one fried.

However, there was the strangest thing:  The BIND9 DNS nameserver
refused to start.  This was A Big Problem for me -- so I kept playing
with it, and eventually discovered a bizarre workaround:

_If_ I manually executed /usr/sbin/named, which is the daemon binary, 
and let it instantly die because it had no environment, conffiles, etc.
invoked with it, _then_ running '/etc/init.d/bind9 start' worked.
Bizarre, eh?  I kept reading logfiles and screwing with it trying to
figure out what was broken and why that worked -- to no avail.  So, I
simply got used to running /usr/sbin/named before the daemon would work.

However, that's no way to run a server.  One day, in frustration and
fatigue, even while knowing that it was a strategic mistake, I thought:
'Suppose there's a bug in the recent BIND9 releases on Debian-stable
that I happen to trigger.  Maybe it's fixed in later versions not yet
available on -stable.'  So, that was when I did the unwise thing,
repointing /etc/apt/sources.list to -unstable, and upgrading BIND9.

Which did not fix the problem -- and also pulled down problematic 
cutting-edge versions of packages as dependencies, notably a glibc that
gave a lot of software on -stable indigestion of various sorts.  But I 
didn't touch that until yesterday, because I suspected it would lead to
marathon efforts and just wasn't thrilled about the prospect.


The punch line:  In the middle of dealing with yesterday's problems, I
figured out the original BIND9 problem.  More or less.

It had to do with all of those backup files on the Solaris remote host.
Because Solaris had munged a lot of the ownerships, among the things
that got the wrong ownership was directory /var/run/named/ .  Which is
where the BIND9 daemon writes its pid file.  Unfortunately, it tried to
do that as user 'bind', group 'bind', while /var/run/named was owned by
root:root.

Somehow, manually running /usr/sbin/named was clearing?  chowning? the
pid file left over in /var/run/named and making it possible for the
BIND9 startup script to write a new one there, without which startup 
silently failed.

I've chowned /var/run/named/ to bind:bind, and now things seem to be
working correctly.





More information about the conspire mailing list