[conspire] Slice of life

Rick Moen rick at linuxmafia.com
Fri Sep 18 15:51:06 PDT 2009


So, just a couple of words about "the Rick test", for the few of you who
are likewise administering significant DNS server installations using
ISC's BIND9 software.  It was:


/usr/sbin/named-checkconf -z -t /var/named/chroot/ /etc/named.conf | \
egrep 'missing|not allowed|unknown|not at top of zone|\
appears to be an address|no current owner name|MAXTTL|file not found|\
may not be used with|outside epoch|in future|invalid|unsupported|no TTL|\
ignoring|TTL set to prior TTL' | sort -u
#Should return null.


You might be wondering what that's all about.


When I took over being the main guy in charge of $FIRM's DNS, I noticed
a recurring syndrome:  Somebody would push a DNS change out of cvs,
sometimes picking up _other_ people's cvs checkins in so doing.  That
person would then go to the master nameserver to bring the changes
online.  

In cases where the change includes dropping a domain, adding a domain,
or otherwise changing BIND9's configuration file, it's not sufficient
to issue the command ("rndc reload") that tells BIND "Don't restart
everything, but just reload all of the DNS zones you service from disk."
In those cases, you have to do "service named restart", which (of
course) first stops the BIND9 daemon completely, unloading it from
memory, and then reloads and relaunches it.

With "rndc reload", the worst that might happen is that BIND9 would
refuse to load a zonefile that it didn't like (syntax error, or such).  
However, with "service named restart", something far, far worse can
happen:  If BIND9 sees something it doesn't like in _either_ its
configuration files _or_ any of the (potentially large number of)
zonefiles, it will choke and die in the middle of loading zones.

Not only that:  Even though BIND9 lists the names of zones as it starts
and loads them, the last one echoed before the daemon dies tells you
nothing about where the problem is.  There you sit, trying to triage
the problem, while waiting for the automated alarms to start coming in,
and the CEO to walk over and ask "How'd you manage to break the master
nameserver?"


Some relatively recent version of BIND9 finally introduced a _separate_
pair of utilities, named-checkzone and named-checkconf, that externally
provide the proper input validation that remains missing from the BIND9
daemon, itself.  named-checkzone can check any individual zonefile(s)
for basic syntax errors -- but doesn't understand chrooting, and so
breaks on #include references to within a chroot jail.

named-checkconf is more useful:  By itself, it checks BIND9's conffiles
for basic syntax errors, and _does_ understand the effects of chrooting.  
Even better, if you include the "-z" flag, it'll also check referenced
zonefiles, again, with correct comprehension of what chrooting is all
about.

So:  "/usr/sbin/named-checkconf -z -t /var/named/chroot/ /etc/named.conf"
produces a very detailed listing of any problems with, first,
/etc/named.conf (and include files) as a BIND9 configuration set, then,
any problems with each of the zones referenced in the conffiles.

The remaining problem is that named-checkconf's report is way, way too
verbose.  Errors and warnings don't stick out, unless you are reading
the hundreds of lines of output very attentively.

To fix that problem, I found (using ldd) the ISC library file that
contains all of named-checkconf's error messages, then abstracted from
those strings sixteen substrings that seemed the ones potentially worth 
worrying about.  The resulting egrep incantation says "filter out all
named-checkconf output lines that don't include one of these significant
error strings, and show only those lines."

As a result, null output shows pretty clearly that everything's OK, and 
anything non-null highlights which zone or conffile has a problem.

I actually just realised, by checking the unfiltered output, that I need
to add an item to the filter list, because of warning messages like
this:

reverse/1-26.0.168.192.in-addr.arpa:16: warning: edge.example.com.1/26.0.168.192.in-addr.arpa: bad name (check-names)


(Again, I'm substituting example.com for the real domain, and
192.168.0.0/26 for the real CIDR IP block.)  One of my colleagues had 
created reverse-DNS zonefile 1-26.0.168.192.in-addr.arpa for reverse
domain 1/26.0.168.192.in-addr.arpa with the following entry:

25   IN  PTR   edge.example.com

...thereby committing the second most-common DNS mistake (after failing
to increment the serial number), because he meant to say:

25   IN  PTR   edge.example.com.

The error didn't break DNS, but it resulted, for lack of the trailing
period, in the reverse DNS for 25.1-26.0.168.192.in-addr.arpa becoming 

   edge.example.com.1/26.0.168.192.in-addr.arpa

...which was not what he intended.

So, I guess my revised test needs to be 

/usr/sbin/named-checkconf -z -t /var/named/chroot/ /etc/named.conf | \
egrep 'missing|not allowed|unknown|not at top of zone|\
appears to be an address|no current owner name|MAXTTL|file not found|\
may not be used with|outside epoch|in future|invalid|unsupported|no TTL|\
ignoring|TTL set to prior TTL|bad name' | sort -u





More information about the conspire mailing list