[conspire] Fun <cough, cough, cough> with DNS ; -) ... or ... let you count the ways ... ; ->
Michael Paoli
Michael.Paoli at cal.berkeley.edu
Fri Mar 30 04:00:34 PDT 2018
So, ... "answers", for ...:
> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> Subject: Fun <cough, cough, cough> with DNS ;-) ... or ... let you
> count the ways ... ;->
> Date: Thu, 22 Mar 2018 04:55:33 -0700
> So ... *maybe* a fun 'lil exercise/challenge. ;-)
>
> I was almost thinking of bringing it up at or just before the last
> CABAL meeting. So, ... DNS booboos, troubleshooting, ...
>
> So, quite recently before the CABAL meeting, had encountered ...
> well, okay, maybe let's play a game. :-)
>
> Start, oh, ... say ... here:
> wsgba-nno.nabmig.synchronoss.net.
> Mmmm... let's say, oh, ... how fast can you find out what, if any,
> A and/or AAAA records exist for that DNS name (just to give one a
> starting point). Oh, and ... say from there, how many significant
> classes/categories of DNS errors can one find? Ooooh, game? :-)
> Sure, why not? How 'bout ...
> o +20 points for each significant class/category of DNS errors one can find
> o -15 points if you use some on-line web checker thingy
> o +5 points for each not strictly DNS but in or closely related to DNS issue
> o +20 points for using dig and/or delv
> o +10 points for using both dig and delv and stating/showing a good
> reason why
> o +10 points if you use devl but not dig and can give good reasons why
> o -10 points if you use nslookup
> o +15 points - DNSSEC - show what right/wrong/nothing they're doing there
> o +5 points - suggest how they could monitor to detect common DNS errors
> o Oh, and there may be a time factor ... they may eventually fix their errors
>
> Tip: you'll probably need IPv6 to fully test their DNS, but you may not
> need IPv6 connectivity to The Internet to find all or most all their DNS
> booboos (I strongly suspect each of their NS servers with IPv6 AAAA
> records is simply single host that's dual stacked with IPv4 and IPv6 - but
> I don't know for sure).
>
> So, ... who gets the high score? :-) Your opponent is the DNS provider,
> their objective is to make your score as low as possible ... they're not
> playing a very good game 8-O ... but to give 'em some credit, the did fix
> about half of their major DNS booboos since I told 'em about their issues
> shortly before the last CABAL meeting ... but, yeah, they still have a
> significant way to go on stepping up their game.
And, further below, slightly redacted and with some formatting tweaks,
the earlier analysis I'd happened to do the day before that CABAL meeting.
< From: Paoli, Michael
< Date: Fri, Mar 9, 2018 at 4:38 PM
< Subject: How *not* to do DNS
< FYI, example of how not to do DNS:
< o 3 of 4 NS IPs fail
< o the one that doesn't completely fail, fails TCP
< o authority records on NS data and authoritative NS data from NS
itself are inconsistent and completely different
< o TTL on NS record is 0.
< 8-O
< There are more errors, but those top the list. <sigh>
< From: Paoli, Michael
< Sent: Friday, March 09, 2018 4:28 PM
< Subject: RE: DNS issues (wsgba-nno.nabmig.synchronoss.net. &
*.nabmig.synchronoss.net.): RE:
< This check also shows some more (but not all of) the errors:
< http://dnscheck.iis.se/?time=1520641388&id=7620478&view=basic&test=standard
< From: Paoli, Michael
< Sent: Friday, March 09, 2018 4:22 PM
< Subject: DNS issues (wsgba-nno.nabmig.synchronoss.net. &
*.nabmig.synchronoss.net.): RE:
< DNS issues ...
< This part is fine and quite fast enough:
< [REDACTED] 7200 IN CNAME wsgba-nno.nabmig.synchronoss.net.
< Issues/failures however, happen and also cause delays (also outright
< failures) on getting IP address(es) for
< wsgba-nno.nabmig.synchronoss.net., most notably 3 of 4 NS IPs fail,
< and NS TTL is 0, thus disallowing caching and forcing lookups each
< time to NS IPs ... 3 out of 4 of which fail.
< e.g.:
< $ dig +trace wsgba-nno.nabmig.synchronoss.net. A
wsgba-nno.nabmig.synchronoss.net. AAAA
< ...
< synchronoss.net. 172800 IN NS ns1.synchronoss.com.
< synchronoss.net. 172800 IN NS ns2.synchronoss.com.
< ;; Received 629 bytes from 192.52.178.30#53(k.gtld-servers.net) in 61 ms
< ...
< nabmig.synchronoss.net. 1800 IN NS
nabseagtm01.nabmig.synchronoss.net.
< nabmig.synchronoss.net. 1800 IN NS
nabcuagtm01.nabmig.synchronoss.net.
< ;; Received 201 bytes from 68.170.16.16#53(ns1.synchronoss.com) in 125 ms
< ...
< ;; connection timed out; no servers could be reached
< $
< Also seeing inconsistent results for answers on queries for
< nameservers (authority NS and "glue" records should be consistent) and
< 0 TTLs on some NS RRs (TTL of 0 disables caching, thus forces lookup
< each time - and 3 of the 4 NS IPs fail (see also further below)):
< $ dig +noall +answer nabmig.synchronoss.net. NS
< nabmig.synchronoss.net. 0 IN NS
nabgtm.nabmig.synchronoss.net.
< $ dig @68.170.29.212 +noall +norecurse +answer nabmig.synchronoss.net. NS
< nabmig.synchronoss.net. 0 IN NS
nabgtm.nabmig.synchronoss.net.
< $ dig +noall +answer nabseagtm01.nabmig.synchronoss.net. A
nabseagtm01.nabmig.synchronoss.net. AAAA
nabcuagtm01.nabmig.synchronoss.net. A
nabcuagtm01.nabmig.synchronoss.net. AAAA
< nabseagtm01.nabmig.synchronoss.net. 176 IN A 68.170.29.212
< nabseagtm01.nabmig.synchronoss.net. 176 IN AAAA 2620:11b:4002:1::1:4
< nabcuagtm01.nabmig.synchronoss.net. 176 IN A 68.170.28.212
< nabcuagtm01.nabmig.synchronoss.net. 176 IN AAAA 2620:11b:4000:1::1:4
< $
< 3 out of 4 nameserver IP addresses are failing:
< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212
2620:11b:4002:1::1:4; do dig @"$nsip" +noall +norecurse +answer
wsgba-nno.nabmig.synchronoss.net. A wsgba-nno.nabmig.synchronoss.net.
AAAA | sed -e 's/$/ ['"$nsip"']'/; done)
< wsgba-nno.nabmig.synchronoss.net. 30 IN A 68.170.28.18 [68.170.28.212]
< ;; connection timed out; no servers could be reached [68.170.28.212]
< ;; connection timed out; no servers could be reached [2620:11b:4000:1::1:4]
< ;; connection timed out; no servers could be reached [2620:11b:4000:1::1:4]
< wsgba-nno.nabmig.synchronoss.net. 30 IN A 68.170.28.18 [68.170.29.212]
< ;; connection timed out; no servers could be reached [68.170.29.212]
< ;; connection timed out; no servers could be reached [2620:11b:4002:1::1:4]
< ;; connection timed out; no servers could be reached [2620:11b:4002:1::1:4]
< $ dig +noall +answer nabgtm.nabmig.synchronoss.net. A
nabgtm.nabmig.synchronoss.net. AAAA
< nabgtm.nabmig.synchronoss.net. 284 IN A 68.170.29.212
< nabgtm.nabmig.synchronoss.net. 284 IN A 68.170.28.212
< nabgtm.nabmig.synchronoss.net. 284 IN AAAA 2620:11b:4000:1::1:4
< nabgtm.nabmig.synchronoss.net. 284 IN AAAA 2620:11b:4002:1::1:4
< $
< All four NS IPs fail TCP connection:
< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212
2620:11b:4002:1::1:4; do nc -vz -w 10 "$nsip" 53; done)
< nc: connect to 68.170.28.212 port 53 (tcp) timed out: Operation now
in progress
< nc: connect to 2620:11b:4000:1::1:4 port 53 (tcp) timed out:
Operation now in progress
< nc: connect to 68.170.29.212 port 53 (tcp) timed out: Operation now
in progress
< nc: connect to 2620:11b:4002:1::1:4 port 53 (tcp) timed out:
Operation now in progress
< $
< The issues also appear quite consistent if checked from either
< [REDACTED], or The Internet.
< A somewhat simplistic on-line DNS check of domain also shows many, but
< not all, of the errors:
<
http://dnscheck.pingdom.com/?domain=nabmig.synchronoss.net×tamp=1520640438&view=1
< FYI, traceroute data:
< $ (for nsip in 68.170.28.212 2620:11b:4000:1::1:4 68.170.29.212
2620:11b:4002:1::1:4; do for proto in U T; do echo "$proto:"; sudo
traceroute -n"$proto"p 53 -f 5 -m 15 "$nsip"; done; done)
< U:
< traceroute to 68.170.28.212 (68.170.28.212), 15 hops max, 60 byte packets
< 5 154.54.43.70 15.694 ms 154.54.43.150 17.784 ms 154.54.43.70 19.634 ms
< 6 154.54.1.194 22.546 ms 24.927 ms 27.047 ms
< 7 * * *
< 8 * * *
< 9 4.31.104.10 98.373 ms 4.59.145.26 103.311 ms 4.31.104.10 103.794 ms
< 10 * 198.17.50.55 97.010 ms 101.944 ms
< 11 68.170.28.195 111.583 ms 122.304 ms 126.656 ms
< 12 68.170.28.196 125.285 ms 93.058 ms 99.947 ms
< 13 * * *
< 14 * * *
< 15 * * *
< T:
< traceroute to 68.170.28.212 (68.170.28.212), 15 hops max, 60 byte packets
< 5 154.54.43.150 15.649 ms 17.957 ms 20.361 ms
< 6 154.54.1.194 22.800 ms 24.964 ms *
< 7 * * *
< 8 * * *
< 9 4.59.145.26 104.272 ms 105.255 ms 108.228 ms
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< U:
< traceroute to 2620:11b:4000:1::1:4 (2620:11b:4000:1::1:4), 15 hops
max, 80 byte packets
< 5 2001:470:0:389::2 264.904 ms 267.920 ms 271.975 ms
< 6 2001:1900::3:189 384.752 ms 385.912 ms 386.952 ms
< 7 * * *
< 8 * * *
< 9 * * *
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< T:
< traceroute to 2620:11b:4000:1::1:4 (2620:11b:4000:1::1:4), 15 hops
max, 80 byte packets
< 5 2001:470:0:389::2 256.858 ms 260.771 ms 264.133 ms
< 6 * 2001:1900::3:189 366.564 ms 369.456 ms
< 7 * * *
< 8 * * *
< 9 * * *
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< U:
< traceroute to 68.170.29.212 (68.170.29.212), 15 hops max, 60 byte packets
< 5 154.54.43.150 376.941 ms 154.54.43.70 379.314 ms 154.54.43.150
380.941 ms
< 6 154.54.5.102 383.172 ms 385.362 ms *
< 7 * * 4.68.110.137 387.584 ms
< 8 * * *
< 9 4.31.98.186 413.277 ms 414.936 ms 416.895 ms
< 10 * 74.116.106.76 44.078 ms 48.313 ms
< 11 68.170.29.203 52.217 ms 68.170.29.201 57.582 ms 68.170.29.203
61.305 ms
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< T:
< traceroute to 68.170.29.212 (68.170.29.212), 15 hops max, 60 byte packets
< 5 154.54.43.70 17.733 ms 19.041 ms 154.54.43.150 34.556 ms
< 6 154.54.1.194 34.596 ms 34.744 ms *
< 7 * * 4.68.110.137 34.684 ms
< 8 * * *
< 9 4.31.98.186 54.960 ms 56.917 ms 59.304 ms
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< U:
< traceroute to 2620:11b:4002:1::1:4 (2620:11b:4002:1::1:4), 15 hops
max, 80 byte packets
< 5 2001:470:0:389::2 18.358 ms 21.621 ms 24.471 ms
< 6 2001:1900::3:197 45.736 ms 55.685 ms 59.062 ms
< 7 * * *
< 8 * * *
< 9 * * *
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< T:
< traceroute to 2620:11b:4002:1::1:4 (2620:11b:4002:1::1:4), 15 hops
max, 80 byte packets
< 5 2001:470:0:389::2 61.447 ms 64.501 ms 67.551 ms
< 6 2001:1900::3:197 88.716 ms 92.369 ms *
< 7 * * *
< 8 * * *
< 9 * * *
< 10 * * *
< 11 * * *
< 12 * * *
< 13 * * *
< 14 * * *
< 15 * * *
< $
< From: [REDACTED]
< Sent: Friday, March 09, 2018 2:20 PM
< Verified the connectivity to [REDACTED]
< However, the command below is slow because the ns lookup takes long.
< telnet [REDACTED] 443
< If use the IP directly, it's fast:
< telnet 68.170.28.18 443
< Can you please check it out?
So ... "answers" :-) Having recently taught someone about rot13,
semi-tempted to ... anyway, been long enough now, eh?
And, to give 'em *some* credit, they've since fixed several of the DNS
errors that were present at that earlier time ... but alas, last I
checked, several key errors still remain (egad, I have to attend yet
another meeting next week to explain to 'em how their DNS is messed up).
So, some of the many key issues shown in my earlier analysis (some of
which have since been corrected):
o 3 of 4 NS servers not answering queries (mostly(?) corrected - that
may have also been issue caused or exacerbated by traffic load due to
the other errors - the RR name is itself rather to quite high traffic
(I think hundreds to thousands or more connections per second), so
certain screw-ups in DNS would also cause the related DNS traffic to
heavily spike).
o NS TTL of 0 forcing no cache and traffic always to authoritative
servers (corrected)
o authoritative servers fail to respond to AAAA queries (a response of
we ain't got no AAAA records for that would be fine - but failing to
respond is not - as that's a failure, and can't be cached, so all the
queries for such are forced to wait for the timeout failure every time).
o 0 of 4 NS server IPs function over TCP
o inconsistent DNS data between delegating authority NS and
authoritative server (corrected) - I didn't include all that data in
what's shown in those emails (was in other captured data/emails) ...
but that might still be partially or fully available at some of those
DNS test URLs showing the earlier data.
Hmmm, and, egad, today's DNS meeting (another RR & domain) I have to
explain how authoritative NS having only and exactly one NS record, and
that NS having only and exactly one A record (and zero AAAA records) is
not only not reliable nor high availability, but also violates RFC
requirements. <sigh>
One's (theoretically) super high availability DNS server does absolutely
no good if/when no packets can get to it! (And that's already happened
at least twice this month).
Now when 'da boss says, "We'll bring our DNS team", it generally means
I've been drafted into the meeting.
More information about the conspire
mailing list