[conspire] Fun <cough, cough, cough> with DNS ; -) ... or ... let you count the ways ... ; ->

Michael Paoli Michael.Paoli at cal.berkeley.edu
Fri Mar 30 04:00:34 PDT 2018


So, ... "answers", for ...:

> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> Subject: Fun <cough, cough, cough> with DNS ;-) ... or ... let you  
> count the ways ... ;->
> Date: Thu, 22 Mar 2018 04:55:33 -0700

> So ... *maybe* a fun 'lil exercise/challenge.  ;-)
>
> I was almost thinking of bringing it up at or just before the last
> CABAL meeting.  So, ... DNS booboos, troubleshooting, ...
>
> So, quite recently before the CABAL meeting, had encountered ...
> well, okay, maybe let's play a game.  :-)
>
> Start, oh, ... say ... here:
> wsgba-nno.nabmig.synchronoss.net.
> Mmmm... let's say, oh, ... how fast can you find out what, if any,
> A and/or AAAA records exist for that DNS name (just to give one a
> starting point).  Oh, and ... say from there, how many significant
> classes/categories of DNS errors can one find?  Ooooh, game?  :-)
> Sure, why not?  How 'bout ...
> o +20 points for each significant class/category of DNS errors one can find
> o -15 points if you use some on-line web checker thingy
> o +5 points for each not strictly DNS but in or closely related to DNS issue
> o +20 points for using dig and/or delv
> o +10 points for using both dig and delv and stating/showing a good
> reason why
> o +10 points if you use devl but not dig and can give good reasons why
> o -10 points if you use nslookup
> o +15 points - DNSSEC - show what right/wrong/nothing they're doing there
> o +5 points - suggest how they could monitor to detect common DNS errors
> o Oh, and there may be a time factor ... they may eventually fix their errors
>
> Tip: you'll probably need IPv6 to fully test their DNS, but you may not
> need IPv6 connectivity to The Internet to find all or most all their DNS
> booboos (I strongly suspect each of their NS servers with IPv6 AAAA
> records is simply single host that's dual stacked with IPv4 and IPv6 - but
> I don't know for sure).
>
> So, ... who gets the high score?  :-)  Your opponent is the DNS provider,
> their objective is to make your score as low as possible ... they're not
> playing a very good game 8-O ... but to give 'em some credit, the did fix
> about half of their major DNS booboos since I told 'em about their issues
> shortly before the last CABAL meeting ... but, yeah, they still have a
> significant way to go on stepping up their game.

And, further below, slightly redacted and with some formatting tweaks,
the earlier analysis I'd happened to do the day before that CABAL meeting.

< From: Paoli, Michael
< Date: Fri, Mar 9, 2018 at 4:38 PM
< Subject: How *not* to do DNS

< FYI, example of how not to do DNS:

< o 3 of 4 NS IPs fail
< o the one that doesn't completely fail, fails TCP
< o authority records on NS data and authoritative NS data from NS
     itself are inconsistent and completely different
< o TTL on NS record is 0.

< 8-O

< There are more errors, but those top the list.  <sigh>

< From: Paoli, Michael
< Sent: Friday, March 09, 2018 4:28 PM
< Subject: RE: DNS issues (wsgba-nno.nabmig.synchronoss.net. &  
*.nabmig.synchronoss.net.): RE:

< This check also shows some more (but not all of) the errors:
< http://dnscheck.iis.se/?time=1520641388&id=7620478&view=basic&test=standard

< From: Paoli, Michael
< Sent: Friday, March 09, 2018 4:22 PM
< Subject: DNS issues (wsgba-nno.nabmig.synchronoss.net. &  
*.nabmig.synchronoss.net.): RE:

< DNS issues ...

< This part is fine and quite fast enough:
< [REDACTED] 7200 IN CNAME wsgba-nno.nabmig.synchronoss.net.

< Issues/failures however, happen and also cause delays (also outright
< failures) on getting IP address(es) for
< wsgba-nno.nabmig.synchronoss.net., most notably 3 of 4 NS IPs fail,
< and NS TTL is 0, thus disallowing caching and forcing lookups each
< time to NS IPs ... 3 out of 4 of which fail.

< e.g.:
< $ dig +trace wsgba-nno.nabmig.synchronoss.net. A  
wsgba-nno.nabmig.synchronoss.net. AAAA

< ...
< synchronoss.net.        172800  IN      NS      ns1.synchronoss.com.
< synchronoss.net.        172800  IN      NS      ns2.synchronoss.com.
< ;; Received 629 bytes from 192.52.178.30#53(k.gtld-servers.net) in 61 ms
< ...
< nabmig.synchronoss.net. 1800    IN      NS       
nabseagtm01.nabmig.synchronoss.net.
< nabmig.synchronoss.net. 1800    IN      NS       
nabcuagtm01.nabmig.synchronoss.net.
< ;; Received 201 bytes from 68.170.16.16#53(ns1.synchronoss.com) in 125 ms
< ...
< ;; connection timed out; no servers could be reached
< $

< Also seeing inconsistent results for answers on queries for
< nameservers (authority NS and "glue" records should be consistent) and
< 0 TTLs on some NS RRs (TTL of 0 disables caching, thus forces lookup
< each time - and 3 of the 4 NS IPs fail (see also further below)):

< $ dig +noall +answer nabmig.synchronoss.net. NS
< nabmig.synchronoss.net. 0       IN      NS       
nabgtm.nabmig.synchronoss.net.
< $ dig @68.170.29.212  +noall +norecurse +answer nabmig.synchronoss.net. NS
< nabmig.synchronoss.net. 0       IN      NS       
nabgtm.nabmig.synchronoss.net.
< $ dig +noall +answer nabseagtm01.nabmig.synchronoss.net. A  
nabseagtm01.nabmig.synchronoss.net. AAAA  
nabcuagtm01.nabmig.synchronoss.net. A  
nabcuagtm01.nabmig.synchronoss.net. AAAA
< nabseagtm01.nabmig.synchronoss.net. 176 IN A    68.170.29.212
< nabseagtm01.nabmig.synchronoss.net. 176 IN AAAA 2620:11b:4002:1::1:4
< nabcuagtm01.nabmig.synchronoss.net. 176 IN A    68.170.28.212
< nabcuagtm01.nabmig.synchronoss.net. 176 IN AAAA 2620:11b:4000:1::1:4
< $

< 3 out of 4 nameserver IP addresses are failing:
< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212  
2620:11b:4002:1::1:4; do dig @"$nsip" +noall +norecurse +answer  
wsgba-nno.nabmig.synchronoss.net. A wsgba-nno.nabmig.synchronoss.net.  
AAAA | sed -e 's/$/ ['"$nsip"']'/; done)
< wsgba-nno.nabmig.synchronoss.net. 30 IN A 68.170.28.18 [68.170.28.212]
< ;; connection timed out; no servers could be reached [68.170.28.212]
< ;; connection timed out; no servers could be reached [2620:11b:4000:1::1:4]
< ;; connection timed out; no servers could be reached [2620:11b:4000:1::1:4]
< wsgba-nno.nabmig.synchronoss.net. 30 IN A 68.170.28.18 [68.170.29.212]
< ;; connection timed out; no servers could be reached [68.170.29.212]
< ;; connection timed out; no servers could be reached [2620:11b:4002:1::1:4]
< ;; connection timed out; no servers could be reached [2620:11b:4002:1::1:4]
< $ dig +noall +answer nabgtm.nabmig.synchronoss.net. A  
nabgtm.nabmig.synchronoss.net. AAAA
< nabgtm.nabmig.synchronoss.net. 284 IN   A       68.170.29.212
< nabgtm.nabmig.synchronoss.net. 284 IN   A       68.170.28.212
< nabgtm.nabmig.synchronoss.net. 284 IN   AAAA    2620:11b:4000:1::1:4
< nabgtm.nabmig.synchronoss.net. 284 IN   AAAA    2620:11b:4002:1::1:4
< $

< All four NS IPs fail TCP connection:
< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212  
2620:11b:4002:1::1:4; do nc -vz -w 10 "$nsip" 53; done)
< nc: connect to 68.170.28.212 port 53 (tcp) timed out: Operation now  
in progress
< nc: connect to 2620:11b:4000:1::1:4 port 53 (tcp) timed out:  
Operation now in progress
< nc: connect to 68.170.29.212 port 53 (tcp) timed out: Operation now  
in progress
< nc: connect to 2620:11b:4002:1::1:4 port 53 (tcp) timed out:  
Operation now in progress
< $

< The issues also appear quite consistent if checked from either
< [REDACTED], or The Internet.

< A somewhat simplistic on-line DNS check of domain also shows many, but
< not all, of the errors:
<  
http://dnscheck.pingdom.com/?domain=nabmig.synchronoss.net&timestamp=1520640438&view=1

< FYI, traceroute data:

< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212  
2620:11b:4002:1::1:4; do for proto in U T; do echo "$proto:"; sudo  
traceroute -n"$proto"p 53 -f 5 -m 15 "$nsip"; done; done)
< U:
< traceroute to 68.170.28.212 (68.170.28.212), 15 hops max, 60 byte packets
< 5  154.54.43.70  15.694 ms 154.54.43.150  17.784 ms 154.54.43.70  19.634 ms
< 6  154.54.1.194  22.546 ms  24.927 ms  27.047 ms
< 7  * * *
< 8  * * *
< 9  4.31.104.10  98.373 ms 4.59.145.26  103.311 ms 4.31.104.10  103.794 ms
< 10  * 198.17.50.55  97.010 ms  101.944 ms
< 11  68.170.28.195  111.583 ms  122.304 ms  126.656 ms
< 12  68.170.28.196  125.285 ms  93.058 ms  99.947 ms
< 13  * * *
< 14  * * *
< 15  * * *
< T:
< traceroute to 68.170.28.212 (68.170.28.212), 15 hops max, 60 byte packets
< 5  154.54.43.150  15.649 ms  17.957 ms  20.361 ms
< 6  154.54.1.194  22.800 ms  24.964 ms *
< 7  * * *
< 8  * * *
< 9  4.59.145.26  104.272 ms  105.255 ms  108.228 ms
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< U:
< traceroute to 2620:11b:4000:1::1:4 (2620:11b:4000:1::1:4), 15 hops  
max, 80 byte packets
< 5  2001:470:0:389::2  264.904 ms  267.920 ms  271.975 ms
< 6  2001:1900::3:189  384.752 ms  385.912 ms  386.952 ms
< 7  * * *
< 8  * * *
< 9  * * *
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< T:
< traceroute to 2620:11b:4000:1::1:4 (2620:11b:4000:1::1:4), 15 hops  
max, 80 byte packets
< 5  2001:470:0:389::2  256.858 ms  260.771 ms  264.133 ms
< 6  * 2001:1900::3:189  366.564 ms  369.456 ms
< 7  * * *
< 8  * * *
< 9  * * *
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< U:
< traceroute to 68.170.29.212 (68.170.29.212), 15 hops max, 60 byte packets
< 5  154.54.43.150  376.941 ms 154.54.43.70  379.314 ms 154.54.43.150   
380.941 ms
< 6  154.54.5.102  383.172 ms  385.362 ms *
< 7  * * 4.68.110.137  387.584 ms
< 8  * * *
< 9  4.31.98.186  413.277 ms  414.936 ms  416.895 ms
< 10  * 74.116.106.76  44.078 ms  48.313 ms
< 11  68.170.29.203  52.217 ms 68.170.29.201  57.582 ms 68.170.29.203   
61.305 ms
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< T:
< traceroute to 68.170.29.212 (68.170.29.212), 15 hops max, 60 byte packets
< 5  154.54.43.70  17.733 ms  19.041 ms 154.54.43.150  34.556 ms
< 6  154.54.1.194  34.596 ms  34.744 ms *
< 7  * * 4.68.110.137  34.684 ms
< 8  * * *
< 9  4.31.98.186  54.960 ms  56.917 ms  59.304 ms
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< U:
< traceroute to 2620:11b:4002:1::1:4 (2620:11b:4002:1::1:4), 15 hops  
max, 80 byte packets
< 5  2001:470:0:389::2  18.358 ms  21.621 ms  24.471 ms
< 6  2001:1900::3:197  45.736 ms  55.685 ms  59.062 ms
< 7  * * *
< 8  * * *
< 9  * * *
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< T:
< traceroute to 2620:11b:4002:1::1:4 (2620:11b:4002:1::1:4), 15 hops  
max, 80 byte packets
< 5  2001:470:0:389::2  61.447 ms  64.501 ms  67.551 ms
< 6  2001:1900::3:197  88.716 ms  92.369 ms *
< 7  * * *
< 8  * * *
< 9  * * *
< 10  * * *
< 11  * * *
< 12  * * *
< 13  * * *
< 14  * * *
< 15  * * *
< $

< From: [REDACTED]
< Sent: Friday, March 09, 2018 2:20 PM

< Verified the connectivity to [REDACTED]

< However, the command below is slow because the ns lookup takes long.
< telnet [REDACTED] 443

< If use the IP directly, it's fast:
< telnet 68.170.28.18 443
< Can you please check it out?

So ... "answers" :-)  Having recently taught someone about rot13,
semi-tempted to ... anyway, been long enough now, eh?
And, to give 'em *some* credit, they've since fixed several of the DNS
errors that were present at that earlier time ... but alas, last I
checked, several key errors still remain (egad, I have to attend yet
another meeting next week to explain to 'em how their DNS is messed up).

So, some of the many key issues shown in my earlier analysis (some of
which have since been corrected):
o 3 of 4 NS servers not answering queries (mostly(?) corrected - that
   may have also been issue caused or exacerbated by traffic load due to
   the other errors - the RR name is itself rather to quite high traffic
   (I think hundreds to thousands or more connections per second), so
   certain screw-ups in DNS would also cause the related DNS traffic to
   heavily spike).
o NS TTL of 0 forcing no cache and traffic always to authoritative
   servers (corrected)
o authoritative servers fail to respond to AAAA queries (a response of
   we ain't got no AAAA records for that would be fine - but failing to
   respond is not - as that's a failure, and can't be cached, so all the
   queries for such are forced to wait for the timeout failure every time).
o 0 of 4 NS server IPs function over TCP
o inconsistent DNS data between delegating authority NS and
   authoritative server (corrected) - I didn't include all that data in
   what's shown in those emails (was in other captured data/emails) ...
   but that might still be partially or fully available at some of those
   DNS test URLs showing the earlier data.

Hmmm, and, egad, today's DNS meeting (another RR & domain) I have to
explain how authoritative NS having only and exactly one NS record, and
that NS having only and exactly one A record (and zero AAAA records) is
not only not reliable nor high availability, but also violates RFC
requirements.  <sigh>
One's (theoretically) super high availability DNS server does absolutely
no good if/when no packets can get to it!  (And that's already happened
at least twice this month).
Now when 'da boss says, "We'll bring our DNS team", it generally means
I've been drafted into the meeting.





More information about the conspire mailing list