[sf-lug] SF-LUG (& BALUG) DNS, etc.

Sat Nov 14 21:20:20 PST 2009

Rick Moen,

Much thanks again for your excellent comment/observations regarding DNS
(e.g. SF-LUG.COM. and BALUG.ORG., etc.), both currently, and in the
past.

Much of what you point out helps to well illustrate how to (and how not
to) do DNS migrations, troubleshooting, issues/impacts/risks, etc.

There definitely were (and likely still are) some SNAFUs to be corrected
(and things caught up, etc.) with the SF-LUG and BALUG related machine
shuffling that happened 2009-11-11.

Not to try and make excuses and/or point fingers, but I had incorrectly
presumed the SF-LUG.COM. DNS stuff was being handled or mostly handled,
and I was mostly focusing on the BALUG.ORG. bits.  (I'm probably like
about tertiary systems administrator for SF-LUG's Xen domU on the
Silicon Mechanics host - which does or would be covering SF-LUG.COM. DNS
bits, but I'm probably primary sysadmin - at least in most regards, for
the BALUG.ORG. stuff).

I think (rough guestimate) many of the SF-LUG DNS bits are (or were)
certainly running behind on being corrected - BALUG's in better shape
there (though far from perfect if one digs deep enough) ... on the
other hand I think SF-LUG is further along in getting a fair bunch of
services going again (further ahead than BALUG, I think, on many of
those).

Anyway, I mostly jumped at least a moderate bit into the SF-LUG.COM. DNS
to help provide essential clues on matters yet to be addressed (I'm still
also working on numerous BALUG items).

Anyway, some more comments in-line, further below.

> Date: Sat, 14 Nov 2009 01:22:51 -0800
> From: Rick Moen <rick at linuxmafia.com>
> Subject: Re: [sf-lug] SF-LUG DNS
> To: sf-lug at linuxmafia.com
> Message-ID: <20091114092251.GS21475 at linuxmafia.com>
> Content-Type: text/plain; charset=utf-8
>
> I wrote:
>
>> Hey, attention, guys:  You need to closely coordinate with your DNS
>> secondaries' sysadmins whenever you move your domain's master DNS to a
>> different IP.

Yes, definitely.  Slaves should know in advance about changes in master
IP(s) or other significant changes impacting slave(s).  Generally also
good to let them know about any significant outages (scheduled or
otherwise) - multiple masters and/or other redundancy can also help make
life easier/better for slaves (and their providers).

> Let me elaborate on that, for a moment.
>
> I'm guessing you guys moved the master nameserver for domain sf-lug.com
> without bothering to coordinate with your secondary.  Of the two IPs in
> your authoritative nameservers list (what "whois" returns and is in the
> parent zone's glue records), one (the master) doesn't respond at all to
> queries:

Yes, essentially correct - some relatively jarring shuffling (could
have been much smoother with much more time and planning invested, but,
well, with just volunteers, sometimes ...)

> $ dig -t soa sf-lug.com @208.96.15.252 +short
> ;; connection timed out; no servers could be reached
> $
>
> One authoritative nameserver (the secondary) does respond, but has been
> unable to get updates since Wednesday:
>
> $ dig -t soa sf-lug.com @198.144.195.186 +short
> ns1.sf-lug.com. jim.well.com. 2007102904 3600 3600 1209600 10800
> $
>
> So, your domain's entire DNS nameservice is currently degraded to the
> point where it's totally dependent on ONE MACHINE, which is among the
> reasons why you should always have minimum three, maximum seven in
> service.
>
> That secondary nameserver's data are going to expire on Wednesday,
> November 25, 6:29 PM, unless you fix the currently broken situation, at
> which point you will have no nameservice at all, in place of the current
> perilously fragile nameservice.

Yes, excellent observation and point - also hinted at by some of the
SOA data I earlier displayed via dig(1).  I knew from what I saw then
that we had about that much time before it went (super)critical - but I
made no statement about that (leaving it as exercise for those that
wished to note and point it out).  If I recall correctly, I dealt with
almost precisely that type of failure with SF-LUG.COM. once before (DNS
master host had been rebooted, init system wasn't configured to restart
that DNS server, so it stayed down - wasn't noticed until the single
slave expired the zone - "Oops!" ... I fairly quickly figured that out,
restarted the master DNS server, and reconfigured the init bits so that
it would automatically (re)start upon system (re)boot).

This is also an area where SF-LUG could benefit from some appropriate
monitoring (e.g. Nagios) - so that problems are detected and appropriate
folks notified - and preferably before they become worse problems.  (and
yes, same could definitely be said for BALUG).

> So, NEVER move a domain's master nameserver to a new IP without
> coordinating closely with all of your secondaries IN ADVANCE.
>
> It is also an extremely bad idea to list the same telephone number and
> the same e-mail address for all contacts in a domain's whois record.
> At bare minimum, you should make sure that the Administrative Contact
> and the Technical Contact are different (and that neither's e-mail goes
> through the domain in question -- SPoF risk).
>
> You guys really should fix that, too.

Yes, a single person as point of contact is generally bad.  Typically
better to have multiple, and where feasible, use out-of-band (e.g. role
based) alias(es) that have zero dependencies upon the DNS in question.
"We" do in fact have an email alias for SF-LUG systems administrators
that has zero DNS dependencies upon SF-LUG.COM. - that could
potentially be used. (might not be suitable for whois admin contact, as
more limited and restricted access may be desired there - but probably
fine for SOA and/or whois technical contact)

Also, Jim Stockford - at least after the master and at least one slave
are in good functioning state again for SF-LUG.COM., let me know if you
want leads on potential additional slaves - might be slightly dated now,
but a fair while back I asked that question for BALUG, and got quite a
number of responses, some of them being good/excellent resources for
such.

Also, as to getting the SF-LUG.COM. master squared away again (or at least
back to as good a shape as it was in before the system swap-out), we do
have this (references/excerpts/slightly redacted):

< Date: Wed, 11 Nov 2009 01:57:47 -0800
< From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
< Subject: FYI, fresh full backup of sf-lug.com. "tower" at ...
< To: <redacted alias>@balug.org
<
< FYI, I've also placed a nice fresh backup of almost* everything*
< from the sf-lug.com "tower" box, at least presently (mounted
< ro,nosuid,nodev) under
< /var/local/balug/backup/0/tower
< on the Silicon Mechanics "vicki" dom0 host.
<
< Also, since UIDs/GIDs between the two systems generally don't match,
< the location is rather protected:
< # ls -ldba /var/local/balug/backup/0
< drwx------ 3 root root 72 2009-11-10 22:57 /var/local/balug/backup/0
< #
<
< *I ommitted stuff under:
< /tmp
< /tmpb
< /bootb
< and also skipped virtual and "empty" (or empty except for lost+found)
< filesystems; I also didn't back up any metadata (e.g. disk partitioning
< information, etc.)