[sf-lug] resolver problem

Michael Paoli Michael.Paoli at cal.berkeley.edu
Fri Apr 8 23:07:52 PDT 2016


So ...

What of telnet?  Somewhere among the many postings, I said something
roughly along the lines of simplest example that can reproduce the
problem.  Browser - especially of the GUI form, is quite large and
complex.  curl - much better.  Even better - wget.  Yet better,
telnet, ... or trace bit better yet, nc.

I made mention of strace ... Rick also likewise made mention of how it
can be overkill / sledgehammer ... tons of output, etc.  But whittled
down to simplest (whatever) to trace that can show the problem, it
becomes quite a bit easier to work with.  In the case of nc, using
strace on that ... I get less than 230 lines of data.  That may still
seem like "a lot" - but that's before even filtering out
"uninteresting" stuff.  But perhaps more importantly - you indicate it
works when you boot from the ISO image, but not when running off the
operating system installed on the host's drive.  So ... capturing
similar from both - same command and all, one working, one not,
respectively booted from ISO, and host's drive - a side-by-side
comparison would likely be highly informative.  Right around where they
start to significantly differ is likely at or exceedingly close to
where our problem is.  So, that could very substantially narrow it down
quite quickly.  And particularly where it works in both cases with
ping, but fails with, e.g. Firefox and curl - as was pointed out by
Rick, highly probable there's some common issue to both, that's
impacting resolution for many, but not all things/programs - even when
trying to resolve the same name.

So, if you do:
$ </dev/null telnet github.com 80
Do you get something like:
Trying 192.30.252.123...
Connected to github.com.
Escape character is '^]'.
Connection closed by foreign host.
$
Or do you get something like a resolver error?
Or better yet, do you still get error resolving the name if you use nc?
E.g.:
$ nc -z github.com 80
$ echo $?

$
That nc works, resolves, and connect, as can be seen by its exit value
- or one could add the -v option.  If I use a name that doesn't
resolve:
$ nc -z test. 80
nc: getaddrinfo: Name or service not known
$
(the test TLD is reserved, thus won't exist on The Internet
(I hear we're supposed to lowercase internet now ... whatever ...
but that does, however, introduce more ambiguity, as at least
historically, internet and The Internet have different meanings.)).
So ... if, presuming nc fails for you on the problematic environment,
notably how do the outputs of these:
$ 2>&1 strace -fv -eall -s2048 nc -z github.com 80 | cut -c-80
compare between that problematic environment,
and booted from ISO? - most notably where do they differ significantly,
including context of, say, half dozen or a bit more lines before and
after?

Oh, and you may want to capture the output without truncating to 80
character width as I show with cut(1) ... but may be easier to compare
- at least by eyeball, when they're trimmed to 80 characters (or even
less) - then much easier to eyeball 'em side-by-side ... or one can
rather effectively do that with suitable options to diff(1).

And why do I ask for telnet or nc?  Started futzing with strace, looking
for simplest that may be able to capture the issue you're experiencing.
I tried curl - nice that it has -I option, to just do headers, and not
follow redirects, etc.  But I also found that curl does a fork (or at
least clone(2)) - unneeded complexity - wget avoids that - but doesn't
have quiet as easy and simple a way to not follow redirects, etc.  http
also simpler than https.  But we don't even need HTTP negotiation, as
presumably the issue shows before even completing a TCP connect.  So
telnet, or even nc, should quite suffice for that.

And looking at strace output on nc vs. even telnet, slightly less stuff
to wade through with nc - and telnet less than wget, and wget less than
curl.

So ... let's presume you also get the failure with nc (or if not that,
with telnet, or if not that, with wget, or if not that, with curl).
Though not the same network or hardware, I can boot DVD ISO pretty
conveniently ...
Ubuntu- 14.04.4 LTS "Trusty Tahr" - Release amd64 (20160217.1)
When I do the strace on nc, I get 228 lines of output (or 224 if the
lookup fails).  If I use telnet (and discard telnet's stderr and stdout
and redirect it's stdin from /dev/null) I get 238 lines of data from
strace:
$ </dev/null 2>&1 >>/dev/null strace -fv -eall -s2048 telnet \
> github.com 80 | cut -c-80 | wc -l
238
$
In the above, leading "> " is shell's PS2 prompt.  Similarly, with wget,
I get more lines.  And with curl, yet more.

And, exactly what OS versions?  Here's what I show booted from ISO:
$ lsb_release -d; uname -m; 2>&1 id | fold -s -w 72; tty; env | fgrep LC
Description:    Ubuntu 14.04.4 LTS
x86_64
uid=999(ubuntu) gid=999(ubuntu)
groups=999(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),108(lpa
dmin),124(sambashare)
/dev/ttyS0
LC_ALL=C
$
IIRC, you mentioned ubuntu 14.[0]4, but don't think you detailed more
specifically.  I do have fair number of ISOs:
https://www.wiki.balug.org/wiki/doku.php?id=balug:cds_and_images_etc
But some are more conveniently at my fingertips:
$ (cd /var/tmp/ISOs && ls -d *.iso) | fgrep 14.04
kubuntu-14.04.2-desktop-amd64.iso
kubuntu-14.04.2-desktop-i386.iso
lubuntu-14.04.1-alternate-i386.iso
lubuntu-14.04.2-desktop-i386.iso
ubuntu-14.04.4-desktop-amd64.iso
ubuntu-14.04.4-desktop-i386.iso
ubuntu-14.04.4-server-amd64.iso

Rick also made mention of ltrace - also excellent tool.  Alas, I tend to
underutilize it - probably mostly because I learned of and had been
using strace for quite a while before I even first heard of ltrace.
ltrace also may be quite useful (maybe even more useful?) in this case,
but strace might be more useful, e.g. in knowing what resources are
accessed, or where access attempts are made, which attempts may fail
and/or give unexpected results.

You could also do strace on ping, for comparison purposes, e.g.:
$ 2>&1 strace -fv -eall -s2048 ping -c 3 github.com | cut -c-80
That actually works and gives me even fewer lines than nc ...
but alas, also want capture of example that should work, but doesn't.
Anyway, working example of ping from OS on the host's drive, may also be
useful in helping explain why ping works, but, e.g. curl fails ... at
least if we happen to also be curious about that.

Oh, strace does also have handy -o option to save its output to a file.
May want to do that, and can always do other stuff after capture to
compare, etc.

Oh, also, point was made about possible drive error/corruption.  That's
one of the reasons I earlier mentioned looking at logs - e.g. in case
there are hardware errors, or other funkiness going on ... logs and/or
dmesg may show such errors, hence I earlier mentioned those.


> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> Subject: Re: [sf-lug] resolver problem
> Date: Fri, 08 Apr 2016 01:55:52 -0700

> Curiouser and curiouser!  ;-)
>
> Well, I still find it quite puzzling, that, e.g.:
> $ ping github.com
> works, whereas:
> $ curl -I https://github.com
> fails with:
> curl: (6) Could not resolve host: github.com
> That particular combination is rather bizarre.
>
> Nevertheless, computers quite logical things - they do exactly what
> they're told (well, notwithstanding some hardware failures and such, but
> even in such case, they still follow the logic of their programming and
> electronics - whatever it dictates from whatever state it's in and goes
> through).
>
> On with divide and conquer.  :-)  There's an answer down there
> *somewhere*.
>
> So, I might suggest, to also make it a bit easier for those attempting
> to follow along and isolate cause of the issue(s)/problem(s), perhaps
> keep a high-level summary table, and also a section on particular
> diagnostic commands and their output (or relevant portions thereof).
> E.g.:
>
> results              operation/command
> ------------------------------------------------------------------------
> Sever not found      Firefox https://github.com/
> Could not resolve    curl -I https://github.com/
> workaround for above when adding IP of github.com to /etc/hosts
> ok                   Chromium https://github.com/
> ok                   ping github.com
> same as above        different user, same host
> all above ok         when booted from Ubuntu 14.04[.x] DVD
>
> diagnostics/commands/output (or selected portions thereof):
>
> $ lsb_release -d || cat /etc/os-release
> ...
> $ uname -m
> ...
> $ cat /etc/resolv.conf
> ...
> $ fgrep hosts /etc/nsswitch.conf
> ...
> $ ip -4 a s
> ...
> $ ip -6 a s
> ...
> $ ip -4 r s
> ...
> $ ip -6 r s
> ...
> (if ip isn't on one's PATH, one can give full pathname, e.g. /sbin/ip)
> It would be good to also show the output of the above commands when
> booted from the Ubuntu DVD. (you could save them to another system or
> some writable USB storage, etc.).
>
> Might also want to occasionally tag that saved output with timestamps -
> that may be particularly valuable if at some point things are found to
> have changed in their behavior.  E.g. occasionally use:
> $ date -Iseconds
> (yes, I know, the -I option is deprecated, but it is so dang handy).
>
> Many have made other good/excellent suggestions and points, ... Rick
> Moen, particularly on looking at it from a bit more of network and
> network connectivity perspective.
>
> I'll also mention, with modern webservers and virtual name hosting and
> SNI, etc., hitting an http or https URL by IP address, rather than
> hostname, may not get same results, even if the site is working
> perfectly and as expected.  However in both cases, one should at least
> still be able to connect to the site (open TCP connection to the IP
> address and port - by name or IP address).
>
> I'd also be very interested in seeing
> tcpdump capture of UDP and TCP traffic to port 53,
> while strace is run on each of these commands:
> $ curl -I https://github.com/
> $ ping github.com
> ... each captured and saved separately for those two commands.
> strace output can be voluminous,
> so too for tcpdump, but limiting the capture to DNS traffic and for the
> short duration of running such command, should be relatively little
> tcpdump data to capture.
> On tcpdump, a few of the options I'd recommend,
> on Linux, can do:
> -i any
> that can be very handy if one isn't sure which interface(s) will be
> used.
> -s 0
> -n
> -p
> -w tcpdump.cap
> The above to save the raw capture - can then examine it with tcpdump
> (-r) or wireshark or whatever - use suitably named files for each
> separate capture (or rename them after completing capture).
> I think you can guess what options I commonly use on strace:
> $ cat ~/bin/Strace
> #!/bin/sh
> exec strace -fv -eall -s2048 ${1+"$@"}
> The -o option can also be handy to save to a file.
> And yes, I didn't detail all the various options I mentioned,
> man(1) is very good at covering those.  :-)
>
> Oh, and of course also, good to keep a log of what you observe, try,
> change, etc.
>
> So, yes, I find it odd that ping works, but curl fails with an apparent
> resolver error - how might it be that it seems one resolves, but the
> other doesn't?  That does seem quite odd.  But there's an answer down
> there somewhere.
>
>
>> From: "Alex Kleider" <akleider at sonic.net>
>> Subject: Re: [sf-lug] resolver problem
>> Date: Thu, 07 Apr 2016 08:53:04 -0700
>
>> On 2016-04-07 00:26, Michael Paoli wrote:
>>> A few more divide & conquer things you could try ...
>>>
>>> So ... works with one browser, not another, works with ping,
>>> but not git?  Efficient divide & conquer typically involves
>>> devising series of sufficiently easy/feasible tests, that
>>> effectively divide possible causes into two sets of
>>> roughly approximate probability, and results of test
>>> rule one set in, and the other out - or at least
>>> significantly assist in determining probability - after test -
>>> between the two sets.  One can also think of whittling it down
>>> to the simplest test that can show the "works" vs. "doesn't work" -
>>> at which point it's generally obvious what the problem is, or at least
>>> exactly where the problem is.
>>>
>>> So ... "browsers" ... & http & https.
>>>
>>> What about wget and/or curl?
>>>
>>> E.g.:
>>> $ curl -I https://www.google.com/
>>> $ curl -I https://github.com/
>>> and/or:
>>> $ wget -q -O - https://www.google.com/ | head -c 256; echo
>>> $ wget -q -O - https://github.com/ | head -c 256; echo
>>>
>>
>> as per instructions, here are the facts:
>> alex at x301:~$ curl -I https://google.com
>> HTTP/1.1 301 Moved Permanently
>> Location: https://www.google.com/
>> Content-Type: text/html; charset=UTF-8
>> Date: Thu, 07 Apr 2016 15:31:07 GMT
>> Expires: Sat, 07 May 2016 15:31:07 GMT
>> Cache-Control: public, max-age=2592000
>> Server: gws
>> Content-Length: 220
>> X-XSS-Protection: 1; mode=block
>> X-Frame-Options: SAMEORIGIN
>> Alternate-Protocol: 443:quic
>> Alt-Svc: quic=":443"; ma=2592000; v="32,31,30,29,28,27,26,25"
>>
>> alex at x301:~$ curl -I https://github.com
>> curl: (6) Could not resolve host: github.com
>>
>> and
>> alex at x301:~$ wget -q -O - https://www.google.com | head -c 256; echo
>> <!doctype html><html itemscope=""  
>> itemtype="http://schema.org/WebPage" lang="en"><head><meta  
>> content="Search the world's information, including webpages,  
>> images, videos and more. Google has many special features to help  
>> you find exactly what you're looking
>> alex at x301:~$ wget -q -O - https://github.com/ | head -c 256; echo
>>
>> alex at x301:~$
>>
>> Switching users: symptoms are the same:
>> Chromium can resolve urls, Firefox can not, and
>> pat at x301:~$ ifconfig
>> eth2      Link encap:Ethernet  HWaddr 00:1c:25:9d:e1:f0
>>          inet addr:10.0.0.16  Bcast:255.255.255.255  Mask:255.255.255.0
>>          inet6 addr: fe80::21c:25ff:fe9d:e1f0/64 Scope:Link
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>          RX packets:397957 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:343874 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:1000
>>          RX bytes:61678643 (61.6 MB)  TX bytes:30890987 (30.8 MB)
>>          Interrupt:20 Memory:f0600000-f0620000
>>
>> lo        Link encap:Local Loopback
>>          inet addr:127.0.0.1  Mask:255.0.0.0
>>          inet6 addr: ::1/128 Scope:Host
>>          UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>          RX packets:263600 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:263600 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:0
>>          RX bytes:27988121 (27.9 MB)  TX bytes:27988121 (27.9 MB)
>>
>> tun0      Link encap:UNSPEC  HWaddr  
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>          inet addr:172.31.2.14  P-t-P:172.31.2.13  Mask:255.255.255.255
>>          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
>>          RX packets:149 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:100
>>          RX bytes:17045 (17.0 KB)  TX bytes:6174 (6.1 KB)
>>
>> pat at x301:~$ ping github.com
>> PING github.com (192.30.252.120) 56(84) bytes of data.
>> 64 bytes from github.com (192.30.252.120): icmp_seq=1 ttl=52 time=88.4 ms
>> 64 bytes from github.com (192.30.252.120): icmp_seq=2 ttl=52 time=87.2 ms
>> 64 bytes from github.com (192.30.252.120): icmp_seq=3 ttl=52 time=88.4 ms
>> ^C
>> --- github.com ping statistics ---
>> 4 packets transmitted, 3 received, 25% packet loss, time 2999ms
>> rtt min/avg/max/mdev = 87.244/88.044/88.452/0.565 ms
>> pat at x301:~$ git clone https://github.com/alexKleider/debk.git
>> Cloning into 'debk'...
>> fatal: unable to access 'https://github.com/alexKleider/debk.git/':  
>> Could not resolve host: github.com
>> pat at x301:~$
>>
>>
>> Does this get us any closer to the root problem?
>> Alex





More information about the sf-lug mailing list