[conspire] Other notes from the Debian 5.0.1/Lenny to 6.0/Squeeze upgrade
Rick Moen
rick at linuxmafia.com
Tue Aug 24 17:59:58 PDT 2010
Oh, and one last thing: I should explain this bit.
> An earlier judgement error (early 2010) had left my system partly
> on Debian-stable and partly on Debian-testing, which is a really bad
> idea. A couple of things were unhappy, but I'd not carried out the
> steps to forward-revision everything to Debian-testing out of a fear
> that there would be considerable breakage and require me to do a
> marathon of emergency system rebuilding.
What possessed me to do such as stupid thing?
It goes back to what happened in April 2008. Shortly before I
was due to go into the hospital for surgery, a spring lightning storm
fried my server, y'all may recall. My 1998-era VA Research Corp. model
500 machine was quite toasty, and really nothing was salvageable. My
best and most recent backup was files I'd rsync'd to Deirdre's
Solaris-based virtual-host-of-sorts at an ISP. So, just after rapidly
building a new Debian-stable system on the current VA Linux model 2230
hardware, I rsync'd the files back -- and discovered to my dismay that
Solaris had munged all the file ownerships, because the rsync backup had
not been conducted as the root user.
I fixed everything that I could, and deployed the almost 100% rebuilt
machine a few hours after the old one fried.
However, there was the strangest thing: The BIND9 DNS nameserver
refused to start. This was A Big Problem for me -- so I kept playing
with it, and eventually discovered a bizarre workaround:
_If_ I manually executed /usr/sbin/named, which is the daemon binary,
and let it instantly die because it had no environment, conffiles, etc.
invoked with it, _then_ running '/etc/init.d/bind9 start' worked.
Bizarre, eh? I kept reading logfiles and screwing with it trying to
figure out what was broken and why that worked -- to no avail. So, I
simply got used to running /usr/sbin/named before the daemon would work.
However, that's no way to run a server. One day, in frustration and
fatigue, even while knowing that it was a strategic mistake, I thought:
'Suppose there's a bug in the recent BIND9 releases on Debian-stable
that I happen to trigger. Maybe it's fixed in later versions not yet
available on -stable.' So, that was when I did the unwise thing,
repointing /etc/apt/sources.list to -unstable, and upgrading BIND9.
Which did not fix the problem -- and also pulled down problematic
cutting-edge versions of packages as dependencies, notably a glibc that
gave a lot of software on -stable indigestion of various sorts. But I
didn't touch that until yesterday, because I suspected it would lead to
marathon efforts and just wasn't thrilled about the prospect.
The punch line: In the middle of dealing with yesterday's problems, I
figured out the original BIND9 problem. More or less.
It had to do with all of those backup files on the Solaris remote host.
Because Solaris had munged a lot of the ownerships, among the things
that got the wrong ownership was directory /var/run/named/ . Which is
where the BIND9 daemon writes its pid file. Unfortunately, it tried to
do that as user 'bind', group 'bind', while /var/run/named was owned by
root:root.
Somehow, manually running /usr/sbin/named was clearing? chowning? the
pid file left over in /var/run/named and making it possible for the
BIND9 startup script to write a new one there, without which startup
silently failed.
I've chowned /var/run/named/ to bind:bind, and now things seem to be
working correctly.
More information about the conspire
mailing list