[conspire] DNS linting tools & tutorials (was: Unbound + dnsmasqd on openwrt)
Rick Moen
rick at linuxmafia.com
Thu Apr 6 21:02:34 PDT 2017
[Replying to my own post for no particular reason except to add
something new to this thread.]
As I mentioned, there are depressingly many ways to do authoritative DNS
-- the kind where you publish actual DNS contents to the world that the
rest of the world uses -- and I could probably write a pretty useful
book about it, if I could remember all the pitfalls I now unconsciously
avoid without usually even thinking about them.
It'd be nice if there were tools to 'lint' your domain's DNS, to check
it for some standard list of Things to Not Do. There actually is a
pretty good one, an online checker called DNSreport hosted at
dnsstuff.com, but, dammit, it's proprietary and likely to remain so.
Maybe a decade ago, when the affable fellow who owns/operates
dnsstuff.com brought DNSreport online and its value -- and slick
presentation -- became apparent, I e-mailed him to compliment and thank
him, and politely ask if he'd consider open-sourcing the software & Web
templates, etc. He very gracefully replied with a pretty definitive
'no' to the idea of open source.
Since DNSreport came online, he's already introduced one usage
restriction after people came to rely on it for a few years: Now, after
you've used it on (IIRC) two domains, it prompts you to pay a
subscription fee for future use -- which you can reset by clearing HTTP
cookies (for now). Obviously, if he wants to get more strict about
monetising usage, he could just log querying IPs and then crack down on
us cookie-clearers.
If I experimented with DNSreport a bit on a bunch of domains, especially
badly operated ones ;-> , I could at least create a checklist of DNS
quality criteria to verify. Then, it would be a matter of chasing down
RFC recommendations for settings, etc.
Let me give you an example. Here is the vital 'SOA' (Start of
Authority) record for linuxmafia.com:
@ IN SOA ns1.linuxmafia.com. rick.deirdre.net. (
2016030200 ; serial
7200 ; refresh 2 hours
3600 ; retry 1 hour
2419200 ; expire 28 days
900 ; negative TTL 15 mins
)
The subfields with text labels 'refresh', 'retry', 'expire', and
'negative TTL' are all quite important to zone persistence in caching,
communication between master and slave nameservers, and (in the case of
negative TTL) how long a 'this doesn't exist in DNS' response (called an
'NXDOMAIN' response) should be cached and reused before even bothering
to check it again.
Every one of those subfields has ranges of values recommended in the
RFCs, and following those recommendations is strongly advisable unless
you're very sure you should do otherwise -- yet I see wildly wrong,
ill-advised values used for people's domains all the time. Why?
Because people get wacky ideas or just pick a number out of the air, and
never check anything that gives them 'You really shouldn't do that'
feedback.
DNSreport _does_ give you that feedback.
(http://mxtoolbox.com/DNSCheck.aspx is also pretty good and in some ways
better -- but not open source, either.)
For illustrative purposes, I'll pick on Steve Litt's GOLUG in Central
Florida, golug.org. Selected criticisms from a DNSreport check follow:
WARN: Parent zone providing NS records: Parent zone does not provide
glue for nameservers, which will cause delays in resolving your domain
name. The following nameserver addresses were not provided by the parent
'glue' and had to be looked up individually. This is perfectly
acceptable behavior per the RFCs. This will usually occur if your DNS
servers are not in the same TLD [RM: top-level domain] as your domain
(for example, a DNS server of "ns1.example.org" for the domain
"example.com"). In this case, you can speed up the connections slightly
by having NS records that are in the same TLD as your domain.
ns1.linode.com. | No Glue | TTL=86400
ns2.linode.com. | No Glue | TTL=86400
Basically, if golug.org's nameservers were referenced using names within
the .org TLD instead of the .com one, the 'glue' records in the parent
.org zone -- the ones that make golug.org's nameservers findable --
would be of higher quality.
The concept of glue records is vital to understanding DNS, specifically
how authority flows down from the root nameservers for the Internet to
(in this case) the namesevers for the .org TLD, then from those
nameservers to golug.org's nameservers. The glue records are what make
this flow down from the root work at all. Those are the record in each
level that specify where the nameservers are for the next level down,
e.g., the NS lines in .org's zonefile that enable finding of golug.org's
nameservers.
_If_ one's nameservers are specified using names inside the domain's
same TLD, then the response to a query to the TLD's nameservers of 'What
are the names of this domain's nameservers?' will return not just the
requested names but also the IPs those names resolve to. E.g., asking
.com's nameservers 'What are the name of the nameservers for
linuxmafia.com?' will return not just the five authoritative
nameservers' names (the requested data) but also the IPs corresponding
to each of those names -- because all of the names are themselves inside
.com, thus in the 'bailiwick' of .com's own nameservers. I carefully
ensured this situation to get the performance advantage.
By contrast, golug.org's nameservers aren't inside the .org namespace,
therefore not in .org nameservers' bailiwick, and the latter TLD
namesever doesn't know their IPs, hence cannot furnish that data. So,
the client asking about golug.org's nameservers need to make additional
follow-up queries to get the golug.org nameservers' IPs before being
able to (finally) ask them questions.
WARN: Stealth nameservers: One or more stealth nameservers
discovered. This means that one or more nameservers are not listed at
both the parent and authoritative nameservers. This can be confusing and
can cause delays or other hard to diagnose inconsistencies. The stealth
nameservers discovered are:
[snip very long detail listing]
This warning is, in the case of golug.org, a bit misleading because
DNSreport isn't handling IPv6-based results quite intelligently enough,
so the problems cited appear to be false positives.
A 'stealth nameserver' is one that is used for conveying information to
the listed authoritative nameservers but that is not itself listed.
Sometimes, this is intended, but in the general case it's accidental and
likely to impair DNS functionality and consistency.
WARN: SOA field check: One or more SOA fields are outside
recommended ranges. Values that are out of specifications could cause
delays in record updates or unnecessary network traffic. The SOA fields
out of range are:
retry | 14400 | RETRY - needs to be less than or equal to half the
REFRESH.
'retry' and 'refresh' are (to re-explain) subfield of the SOA record,
both related to how slave (secondary) nameservers are being instructed
to interact with the master (primary) nameserver that supplies the zone
to them via information-sharing sessions called 'zone transfers'.
retry: how long a name server should wait to retry an attempt to get fresh
zone data from the primary name server if the first attempt should fail.
refresh: how often a name server should check its primary server to
see if there has been any updates to the zone
Decades ago, the 'refresh', 'retry', and 'expire' values were the main
means by which master/slave communication for zone tranfers was
regulated. 'refresh' is how old a slave nameserver's copy of the zone
may become before the slave should check in, of its own volition, and
seek to pull down a fresh copy. 'retry' is how often to try again if
that fails. 'expire' is how old a slave's copy of the zone may become
(because efforts to refresh aren't working) before the slave is required
to discard all zone information as too stale.
Those three controls still exist but are now primarily a fallback
measure, because of a newer measure carried out at the master nameserver
every time it has new information in the zone: It sends out what is
called a DNS 'NOTIFY' signal to each slave for the zone, essentially
saying 'Hey, there's new stuff. You should do a zone transfer to pull
it down and update your copy.'
Anyway, golug.org has 'refresh' and 'retry' _both_ set to 14400 seconds
= 4 hours, which means 'retry' is way too large to do anything useful.
FAIL Acceptance of 'abuse': Mailserver rejected mail to 'abuse'.
Mailservers are required by RFC2142 Section 2 to have a valid
abuse address that is accepting mail.
192.241.244.162 | unexpected response to [RCPT TO: ] | 550 Mailbox
does not exists
Indeed, this is a serious problem. Technically this is a problem
in golug.org's mail server rather than the DNS, but DNSreport tries
to find problems in domain _management_ along with DNS issues as such
-- and the fact is that domains willing to accept mail _are_ indeed
required to accept mail for two specific mailboxes, postmaster and
abuse. The abuse mailbox is a mandated standard address for complaints
about mail abuse.
Many sites doing SMTP willblaket-refuse mail from any domain that
refuses mail to postmaster@[domain] or abuse@[domain], so it's strongly
in your interest to comply. (Why the refusal? Mostly because
mail-sending domains not bothering with RFC-compliance correlates
strongly with 'spam host', so that is part of the low-hanging fruit of
automated spam-rejection.)
DNSreport's other comments (the 'PASS' items) are also illuminating, so
I recommend playing with it. (Be aware that there are some bugs, e.g.
it currently reports some things about linuxmafia.com DNS that are not
true.)
On another note, I'm betting that this set of public Web pages are a
top-quality tutorial on DNS: http://www.zytrax.com/books/dns/
More information about the conspire
mailing list