[sf-lug] pdns-recursor

Tue May 4 09:09:12 PDT 2010

Quoting Alex Kleider (a_kleider at yahoo.com):

> Thanks, Rick, for the tip. 
> 
> I've got 'unbound' installed and 
> # ps aux | grep unbound 
> tells me it's running.
> It's running on host 'plug' that has IP address 10.0.0.175
> 
> .. but it's still not actually being used for DNS!
> I can't seem to get the configuration figured out.
> If I instruct my dhcp server (also running on 'plug') to direct clients to 10.0.0.175 for dns, things fail.
> I would have thought that part of the unbound configuration would be
> to direct it to an outside DNS server (or servers) where it could
> begin it's queries but the documentation makes no mention of such an
> entry.

There is such a configuration.  It's a default. 

To be more precise, the daemon has an internal list of the 13 root
nameservers that it relies on to find the rest of the DNS world, e.g.,
the top-level domain nameservers for .COM namespace, the domain-specific
nameservers for linuxmafia.com namespace, etc.  You might be used to
forwarder-type nameservers that don't actually do the work of resolution
themselves, but just send all queries off to a full-service nameserver
whose IP is listed in the forwarder's conffile.  Unbound _is_ a
full-service nameserver, so all it needs to do its job is its internal
knowledge of what IPs the world's 13 root nameservers have.

Just in case the roster of root nameservers changes, which happens at
long intervals, Unbound's unbound.conf file supports a 'root-hints'
keyword, which can point to a list of root nameservers in standard
zonefile format.  You _might_ already find such a file referenced from
your unbound.conf .

Anyway, you might be thinking 'That's cool, but I don't see how it helps
me solve my problem.'  True, but I wanted to address your question.  ;->
Also, you should know how the thing works.

What you want to do is use a simple tool like 'dig' for diagnostic
purposes, to see _where_ things are failing.  So, for starters, login to
the 'plug' host.  You're now on the machine where Unbound is running.
That host has (at least) two active network interfaces:  the loopback
interface has (as always) IP address 127.0.0.1, and one of the others,
probably eth0, has IP 10.0.0.175.  Let's send Unbound a query from the
local machine, on each of those interfaces:

$ dig  linuxmafia.com  @127.0.0.1  +short
$ dig  linuxmafia.com  @10.0.0.175  +short

You've asked Unbound to resolve 'linuxmafia.com' (looking up the
'A'-type forward-lookup record by default), and told the 'dig' utility
to pass that request to Unbound via first the loopback IP and then the
network-connected one.  The '+short' flag is a 'Just the facts, ma'am' 
request to return just the tersest possible response, omitting much of
the detail.  You might want to also see what the full response looks
like, without that flag.  

Among the details suppressed by the '+short' flag is any error-debugging
information such as the reasons _why_ you are getting a null response,
so it's good to get to know when '+short' is a bit too terse.  For
example, dig might be encountering error condition 'SERVFAIL', which
means it got a socket to a running DNS daemon, which reported that it's 
sick and unable to do work at the moment. 

The 'dig' utility is really good for getting to the bottom of DNS
problems, and also for just observing how it works.  (Try the +trace
flag, some time.)  Many people are still being advised to use
'nslookup'; don't!  That's really bad advice, because nslookup is buggy
and gives provably wrong DNS answers in some cases.

Anyhow, the answers you get to the above-cited queries determine where
your problem is.  If neither one gives an answer, then either Unbound
isn't running at all, or something (firewall rules, overly tight
'interface' and 'access-control'-line ACLs in unbound.conf) is
preventing anything from querying Unbound at all.  Fix that.

If you're getting an answer on the 127.0.0.1 interface but not on the
10.0.0.175 one, again, probably overly tight ACLs or such.

If you're getting answer on _both_ interfaces, now you should login to 
your local workstation, the one you're trying to convince to talk to the
Unbound instance on 'plug', and re-submit the non-loopback query:

$ dig  linuxmafia.com  @10.0.0.175  +short

(Oh, and make sure you can ping 10.0.0.175, just as a sanity check.)

If you are unable to sucessfully query 10.0.0.175 via 'dig' from the 
workstation, e.g., dig encounters a timeout, then I'll bet your ACL list in
Unbound's unbound.conf needs rewriting to permit queries from your local
network.

If the 'dig' query _does_ succeed, then your problem isn't DNS, and I
don't know what happened, because you've just vetted everything all the
way out, one step at a time.

Actually, there's one way I can think of, that that might happen:  Let's
say you had a DNS-using utility like a Web browser already running when
you configured your DHCP daemon to point clients towards 10.0.0.175 for
nameservice.  (By the way, did you verify that the desired IP
information actually _is_ propagating out from your DHCP daemon to the
client machine's /etc/resolv.conf file?  If not, there's your problem.)
User programs like Web browsers tend to have 'stub DNS resolvers' built
into them, limited functionality DNS client software that's just a
little dumb, and picks up data from your system DNS client configuration
(/etc/resolv.conf, /etc/nsswitch.conf) at program launch time -- and
never revises it thereafter.  So, your already-running Web browser would
not have picked up resolv.conf changes from your DHCP daemon.