[sf-lug] sf-lug.com web site accessibility

Rick Moen rick at linuxmafia.com
Fri Feb 27 17:54:54 PST 2009


Quoting jim (jim at well.com):

>    seems the DNS server for sf-lug.com is not 
> functioning. 

Well, specifically, the _master_ namserver has developed an attitude
problem and is refusing to play.  Which then leaves the slave nameserver
nothing to work with.  More below.

>    the box is working and browsing the ip address 
> ( 208.96.15.252 ) brings up the web site where 
> browsing www.sf-lug.com times out. 
>    dig sf-lug.com returns 
> ;; connection timed out; no servers could be reached 

When doing nameserver diagnosis, you really shouldn't just say 
"dig [record]", because then you have no idea where that query 
is going.  You want to use the "@" qualifier to send it to a relevant
nameserver.

My recollection, by the way, is that, if you don't bother to specify a
destination nameserver, the query will go to a randomly selected
nameserver from among those listed in /etc/resolv.conf (your local
DNS client = resolver library's conffile).

>    the name is registered with network solutions, 
> which shows sf-lug.com points to ns1.sf-lug.com and 
> ns2.sf-lug.com 

Correct.

>    dig ns1.sf-lug.com shows an internet A record of 
> 208.96.15.252 (this dns server is on the box itself). 
>    dig ns2.sf-lug.com shows an internet A record of 
> 192.144.195.186 which seems to be (or recently have 
    ^
> been) on the sf peninsula. 

You mis-typed a digit.

:r! dig ns2.sf-lug.com +short
198.144.195.186

I know that IP.  ;->

:r! dig -x 198.144.195.186 +short
linuxmafia.COM.

That's _my_ nameserver.

Let's query specifically my nameserver about "www.sf-lug.com"

:r! dig www.sf-lug.com @ns2.sf-lug.com 

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 12433
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

Double-checking by asking for the SOA header:

:r! dig -t soa www.sf-lug.com @ns2.sf-lug.com

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 6328
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

OK, that proves that my nameserver is refusing queries about the
sf-lug.com domain as a whole, not just the "A" record for
"www.sf-lug.com".

Next question, is my nameserver _trying_ to serve DNS for that domain?
Looking in /etc/bind/named.conf.local:

zone "sf-lug.com" {
        type slave; 
        file "/var/cache/bind/sf-lug.com.zone";
        allow-query { any; };
        allow-transfer { none; };
        masters {
        //ns1.sf-lug.com is:
        208.96.15.252;
        };
};


So, the answer is "Yes, it is, serving as a secondary = slave nameserver
by pulling down zone transfers from master nameserver ns1.sf-lug.com, 
which is at IP 208.96.15.252, and storing the thus-received zonefiles
in the /var/cache/bind record as filename sf-lug.com.zone."

Do we have a zonefile sf-lug.com.zone in /var/cache/bind?

:r /var/cache/bind/sf-lug.com.zone


$ORIGIN .
$TTL 86400	; 1 day
sf-lug.com		IN SOA	ns1.sf-lug.com. jim.well.com. (
				2007102904 ; serial
				3600       ; refresh (1 hour)
				3600       ; retry (1 hour)
				1209600    ; expire (2 weeks)
				10800      ; minimum (3 hours)
				)
			NS	ns1.sf-lug.com.
			NS	ns2.sf-lug.com.
			A	208.96.15.252
			MX	5 mail.sf-lug.com.
			TXT	"v=spf1 a mx -all"
$ORIGIN sf-lug.com.
mail			A	208.96.15.252
ns1			A	208.96.15.252
ns2			A	198.144.195.186
www			A	208.96.15.252


:r! ls -l /var/cache/bind/sf-lug.com.zone
-rw-r--r-- 1 bind bind 470 2009-02-12 22:00 /var/cache/bind/sf-lug.com.zone

Well, there's part of your answer:  The last successful zone transfer
was on Feb. 12, and all the records in the zone have a Time to Live
(TTL) of 1 day = 86400 seconds.  _So_, my nameserver does have a copy of
the zone, but all the records are marked as expired.

And, here's a direct test of what I suspect is the root cause:


:r! dig -t axfr sf-lug.com @ns1.sf-lug.com
;; Connection to 208.96.15.252#53(208.96.15.252) for sf-lug.com failed: connection refused.

Doing a "dig" with type = "axfr" is performing a manual zone transfer. 
Attempting to do so _from_ the command line of linuxmafia.com aka
ns2.sf-lug.com (which is what I effectively just did, above) _should_ 
have worked -- but is being refused upstream at ns1.sf-lug.com, IP
address 208.96.15.252 .

And what is 208.96.15.252?

:r! dig -x 208.96.15.252 +short
208.96.15.252.servepath.com.

Over to you guys, Jim.  At the moment, my nameserver is set up to slave
to an IP that's, for some reason, no longer willing to send it zone
data.[1]

If you wish, I can easily switch my nameservice over to _master_
namservice for the zone, until you can line up replacement master
nameservice somewhere.  That would un-break the DNS, because my 
nameserver would cease to regard its cached copy of the zone as expired.

In the longer term, please just let me know where _other_ than
208.96.15.252 my slave nameservice should try to pull from.  (Actually,
my default will be to take no action, which is probably what you want if
you think 208.96.15.252's refusal might be an error / accidental and
will soon be fixed.


>    sf-lug.org works as expected. 

Are you clear on how you check this?

1.  Get all authoritative nameserver IPs from the "whois" record.
2.  Query each of those IPs (dig -t soa [domain] @[IP]) for the zone SOA
    record.  Make sure all return the same S/N in the SOA record.


>    balug.org works as expected (although its internet 
> A record is 208.113.160.236 according to dns SERVER 
> 216.231.41.2 which is a speakeasy domain name server 
> host). this implies that balug has repointed its 
> domain name away from the box at servepath. 

:r! whois balug.org | grep "Name Server"
Name Server:NS1.DREAMHOST.COM
Name Server:NS2.DREAMHOST.COM
Name Server:NS3.DREAMHOST.COM
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 
Name Server: 


:r! dig -t soa balug.org @NS1.DREAMHOST.COM +short
ns1.dreamhost.com. hostmaster.dreamhost.com. 2008070600 16991 1800 1814400 14400

:r! dig -t soa balug.org @NS2.DREAMHOST.COM +short
ns1.dreamhost.com. hostmaster.dreamhost.com. 2008070600 16991 1800 1814400 14400

:r! dig -t soa balug.org @NS3.DREAMHOST.COM +short
ns1.dreamhost.com. hostmaster.dreamhost.com. 2008070600 16991 1800 1814400 14400

So, with identical S/Ns, unless there's something _really_ peculiar
going on, at least they have the same zonefile.  (There are freaky 
cases where, through someone's screwup, master and slave nameservers can
end up with differing zonefiles that nonetheless have the same S/N, but 
that's unlikely.)

Checking the "A" record for FQDN "www.balug.org" at the first
nameserver:

:r! dig www.balug.org @NS1.DREAMHOST.COM +short
208.113.160.236

:r! host NS1.DREAMHOST.COM
NS1.DREAMHOST.COM has address 66.33.206.206

:r! host NS2.DREAMHOST.COM
NS2.DREAMHOST.COM has address 208.96.10.221

:r! host NS3.DREAMHOST.COM
NS3.DREAMHOST.COM has address 66.33.216.216


And, er, _why_ are querying "216.231.41.2", which isn't in balug.org's
authoritative list?  


:r! dig -x 216.231.41.2 +short
ns-legacy.speakeasy.net.

That IP _is_, as you say, a Speakeasy.net nameserver, but, the point is,
why are you asking it about balug.org at all?  

Yes, it did answer that query, but only in its role as a recursive
server.  It basically asked the Internet as a whole, which fetched data
that ultimately would have derived from one of the authoritative servers
at Dreamhost -- which means it was cached data and could have been 
obsolete.  In general, you'll get more timely and accurate DNS data by
querying the authoritative servers, not just any ol' nameserver that
happens to be near you.


>    i'll first try to discover what's up with the 
> ns2 host. default resolution will be to use the 
> network solutions DNS server. 

Er, feel free to telephone me if you need investigation.  I happen to
know DNS pretty well.  As you'll see (above), there's literally
_nothing_ wrong with ns2.  It's trying its level best to pull down
zonefile data from ns1; ns1 is refusing.  Hence, starting Feb. 13, 
when the TTL expired, all it's had is stale, expired data.

[1] After writing the above, I re-read what you said about what
"208.96.15.252", e.g., "this dns server is on the box itself".
Therefore, the ultimate answer seems to be:  "Jim, you guys need to make
sure your master nameserver is running, if you expect my slave to pick
up zonefiles from it."





More information about the sf-lug mailing list