[sf-lug] resolver problem

Fri Apr 8 02:42:56 PDT 2016

Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):

> I'll also mention, with modern webservers and virtual name hosting and
> SNI, etc., hitting an http or https URL by IP address, rather than
> hostname, may not get same results, even if the site is working
> perfectly and as expected.  However in both cases, one should at least
> still be able to connect to the site (open TCP connection to the IP
> address and port - by name or IP address).

Yeah, I just didn't want to get into that fine point.  I figured if Alex
test-connected by IP rather than FQDN and reported back 'I got a page
back but not what I expected', I'd explain 'That means success, but
the NameVirtualHost complication meant the HTTPd didn't return the exact
requested site HTML', and such.

Sometimes, it's actively bad to cover every little detail, possible edge
case, etc., because the resulting verbosity stands in the way of clarity
on the essential message.

> I'd also be very interested in seeing
> tcpdump capture of UDP and TCP traffic to port 53,
> while strace is run on each of these commands:
> $ curl -I https://github.com/
> $ ping github.com
> ... each captured and saved separately for those two commands.

Alex is amazing, but you're asking for some pretty hard-core stuff from
him.  Could happen.  ;->

Me, I'd just see if the DNS-related badness observed in the installed
distro occurs identically if running the exact same Ubuntu release from
live-CD media.  If (as expected) no, then assume the installed OS is
screwed up.

If the host's installed OS is screwed up, then it's either user dotfiles
in homedirs, system conffiles in /etc, or system binaries/libs.
Problems with user dotfiles can be cross-checked by testing using a new
scratch user.  But if the problem's one of the other two things, then
IMO the sysadmin needs to replace the blown system and, in the future,
be a great deal more careful what he/she does with root authority
(including sudo superuser, etc.).

IMO, the key tool to use is simple logic, as usual.  Tally up suspects,
attempt to see which can be ruled out, use standard methods for dividing
one large problem into multiple smaller problems, and all the usual ways
of figuring things out.

> Oh, and of course also, good to keep a log of what you observe, try,
> change, etc.

I don't want to dump on Alex, whom I hold in high regard, but the
leading suspect in any system-level damage to a *ix box is the sysadmin
-- i.e., actions a superuser took that changed /etc -tree contents or
system binaries.  Anyone who doesn't believe that, try absent-mindedly
messing around with /etc/pam.d/* or /etc/nsswitch.conf using a text
editor and root (or, by implication, sudo superuser) authority.  For
extra credit, don't keep track of what you change.

It's not for nothing that sysadmins adopted configuration management.

> So, yes, I find it odd that ping works, but curl fails with an apparent
> resolver error - how might it be that it seems one resolves, but the
> other doesn't?  That does seem quite odd.  But there's an answer down
> there somewhere.

Am betting that ping and curl use different syscalls for DNS lookup.
Like maybe, one uses the local glibc-provided resolver (DNS client), the
other uses a nameserver from /etc/resolv.conf .  Or something like that.