[sf-lug] resolver problem

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Apr 17 13:17:32 PDT 2016


So, Alex, ... I think at least some of us are curious ...

Did you get to the bottom of you "resolver" problem yet?
Or otherwise solve/fix it?
Or still working on it?
Any noteworthy developments/findings?

> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> To: "Alex Kleider" <akleider at sonic.net>
> Cc: sf-lug at linuxmafia.com
> Subject: Re: [sf-lug] resolver problem
> Date: Fri, 08 Apr 2016 23:07:52 -0700

> So ...
>
> What of telnet?  Somewhere among the many postings, I said something
> roughly along the lines of simplest example that can reproduce the
> problem.  Browser - especially of the GUI form, is quite large and
> complex.  curl - much better.  Even better - wget.  Yet better,
> telnet, ... or trace bit better yet, nc.
>
> I made mention of strace ... Rick also likewise made mention of how it
> can be overkill / sledgehammer ... tons of output, etc.  But whittled
> down to simplest (whatever) to trace that can show the problem, it
> becomes quite a bit easier to work with.  In the case of nc, using
> strace on that ... I get less than 230 lines of data.  That may still
> seem like "a lot" - but that's before even filtering out
> "uninteresting" stuff.  But perhaps more importantly - you indicate it
> works when you boot from the ISO image, but not when running off the
> operating system installed on the host's drive.  So ... capturing
> similar from both - same command and all, one working, one not,
> respectively booted from ISO, and host's drive - a side-by-side
> comparison would likely be highly informative.  Right around where they
> start to significantly differ is likely at or exceedingly close to
> where our problem is.  So, that could very substantially narrow it down
> quite quickly.  And particularly where it works in both cases with
> ping, but fails with, e.g. Firefox and curl - as was pointed out by
> Rick, highly probable there's some common issue to both, that's
> impacting resolution for many, but not all things/programs - even when
> trying to resolve the same name.
>
> So, if you do:
> $ </dev/null telnet github.com 80
> Do you get something like:
> Trying 192.30.252.123...
> Connected to github.com.
> Escape character is '^]'.
> Connection closed by foreign host.
> $
> Or do you get something like a resolver error?
> Or better yet, do you still get error resolving the name if you use nc?
> E.g.:
> $ nc -z github.com 80
> $ echo $?
>
> $
> That nc works, resolves, and connect, as can be seen by its exit value
> - or one could add the -v option.  If I use a name that doesn't
> resolve:
> $ nc -z test. 80
> nc: getaddrinfo: Name or service not known
> $
> (the test TLD is reserved, thus won't exist on The Internet
> (I hear we're supposed to lowercase internet now ... whatever ...
> but that does, however, introduce more ambiguity, as at least
> historically, internet and The Internet have different meanings.)).
> So ... if, presuming nc fails for you on the problematic environment,
> notably how do the outputs of these:
> $ 2>&1 strace -fv -eall -s2048 nc -z github.com 80 | cut -c-80
> compare between that problematic environment,
> and booted from ISO? - most notably where do they differ significantly,
> including context of, say, half dozen or a bit more lines before and
> after?
>
> Oh, and you may want to capture the output without truncating to 80
> character width as I show with cut(1) ... but may be easier to compare
> - at least by eyeball, when they're trimmed to 80 characters (or even
> less) - then much easier to eyeball 'em side-by-side ... or one can
> rather effectively do that with suitable options to diff(1).
>
> And why do I ask for telnet or nc?  Started futzing with strace, looking
> for simplest that may be able to capture the issue you're experiencing.
> I tried curl - nice that it has -I option, to just do headers, and not
> follow redirects, etc.  But I also found that curl does a fork (or at
> least clone(2)) - unneeded complexity - wget avoids that - but doesn't
> have quiet as easy and simple a way to not follow redirects, etc.  http
> also simpler than https.  But we don't even need HTTP negotiation, as
> presumably the issue shows before even completing a TCP connect.  So
> telnet, or even nc, should quite suffice for that.
>
> And looking at strace output on nc vs. even telnet, slightly less stuff
> to wade through with nc - and telnet less than wget, and wget less than
> curl.
>
> So ... let's presume you also get the failure with nc (or if not that,
> with telnet, or if not that, with wget, or if not that, with curl).
> Though not the same network or hardware, I can boot DVD ISO pretty
> conveniently ...
> Ubuntu- 14.04.4 LTS "Trusty Tahr" - Release amd64 (20160217.1)
> When I do the strace on nc, I get 228 lines of output (or 224 if the
> lookup fails).  If I use telnet (and discard telnet's stderr and stdout
> and redirect it's stdin from /dev/null) I get 238 lines of data from
> strace:
> $ </dev/null 2>&1 >>/dev/null strace -fv -eall -s2048 telnet \
>> github.com 80 | cut -c-80 | wc -l
> 238
> $
> In the above, leading "> " is shell's PS2 prompt.  Similarly, with wget,
> I get more lines.  And with curl, yet more.
>
> And, exactly what OS versions?  Here's what I show booted from ISO:
> $ lsb_release -d; uname -m; 2>&1 id | fold -s -w 72; tty; env | fgrep LC
> Description:    Ubuntu 14.04.4 LTS
> x86_64
> uid=999(ubuntu) gid=999(ubuntu)
> groups=999(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),108(lpa
> dmin),124(sambashare)
> /dev/ttyS0
> LC_ALL=C
> $
> IIRC, you mentioned ubuntu 14.[0]4, but don't think you detailed more
> specifically.  I do have fair number of ISOs:
> https://www.wiki.balug.org/wiki/doku.php?id=balug:cds_and_images_etc
> But some are more conveniently at my fingertips:
> $ (cd /var/tmp/ISOs && ls -d *.iso) | fgrep 14.04
> kubuntu-14.04.2-desktop-amd64.iso
> kubuntu-14.04.2-desktop-i386.iso
> lubuntu-14.04.1-alternate-i386.iso
> lubuntu-14.04.2-desktop-i386.iso
> ubuntu-14.04.4-desktop-amd64.iso
> ubuntu-14.04.4-desktop-i386.iso
> ubuntu-14.04.4-server-amd64.iso
>
> Rick also made mention of ltrace - also excellent tool.  Alas, I tend to
> underutilize it - probably mostly because I learned of and had been
> using strace for quite a while before I even first heard of ltrace.
> ltrace also may be quite useful (maybe even more useful?) in this case,
> but strace might be more useful, e.g. in knowing what resources are
> accessed, or where access attempts are made, which attempts may fail
> and/or give unexpected results.
>
> You could also do strace on ping, for comparison purposes, e.g.:
> $ 2>&1 strace -fv -eall -s2048 ping -c 3 github.com | cut -c-80
> That actually works and gives me even fewer lines than nc ...
> but alas, also want capture of example that should work, but doesn't.
> Anyway, working example of ping from OS on the host's drive, may also be
> useful in helping explain why ping works, but, e.g. curl fails ... at
> least if we happen to also be curious about that.
>
> Oh, strace does also have handy -o option to save its output to a file.
> May want to do that, and can always do other stuff after capture to
> compare, etc.
>
> Oh, also, point was made about possible drive error/corruption.  That's
> one of the reasons I earlier mentioned looking at logs - e.g. in case
> there are hardware errors, or other funkiness going on ... logs and/or
> dmesg may show such errors, hence I earlier mentioned those.
>
>
>> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
>> Subject: Re: [sf-lug] resolver problem
>> Date: Fri, 08 Apr 2016 01:55:52 -0700
>
>> Curiouser and curiouser!  ;-)
>>
>> Well, I still find it quite puzzling, that, e.g.:
>> $ ping github.com
>> works, whereas:
>> $ curl -I https://github.com
>> fails with:
>> curl: (6) Could not resolve host: github.com
>> That particular combination is rather bizarre.
>>
>> Nevertheless, computers quite logical things - they do exactly what
>> they're told (well, notwithstanding some hardware failures and such, but
>> even in such case, they still follow the logic of their programming and
>> electronics - whatever it dictates from whatever state it's in and goes
>> through).
>>
>> On with divide and conquer.  :-)  There's an answer down there
>> *somewhere*.
>>
>> So, I might suggest, to also make it a bit easier for those attempting
>> to follow along and isolate cause of the issue(s)/problem(s), perhaps
>> keep a high-level summary table, and also a section on particular
>> diagnostic commands and their output (or relevant portions thereof).
>> E.g.:
>>
>> results              operation/command
>> ------------------------------------------------------------------------
>> Sever not found      Firefox https://github.com/
>> Could not resolve    curl -I https://github.com/
>> workaround for above when adding IP of github.com to /etc/hosts
>> ok                   Chromium https://github.com/
>> ok                   ping github.com
>> same as above        different user, same host
>> all above ok         when booted from Ubuntu 14.04[.x] DVD
>>
>> diagnostics/commands/output (or selected portions thereof):
>>
>> $ lsb_release -d || cat /etc/os-release
>> ...
>> $ uname -m
>> ...
>> $ cat /etc/resolv.conf
>> ...
>> $ fgrep hosts /etc/nsswitch.conf
>> ...
>> $ ip -4 a s
>> ...
>> $ ip -6 a s
>> ...
>> $ ip -4 r s
>> ...
>> $ ip -6 r s
>> ...
>> (if ip isn't on one's PATH, one can give full pathname, e.g. /sbin/ip)
>> It would be good to also show the output of the above commands when
>> booted from the Ubuntu DVD. (you could save them to another system or
>> some writable USB storage, etc.).
>>
>> Might also want to occasionally tag that saved output with timestamps -
>> that may be particularly valuable if at some point things are found to
>> have changed in their behavior.  E.g. occasionally use:
>> $ date -Iseconds
>> (yes, I know, the -I option is deprecated, but it is so dang handy).
>>
>> Many have made other good/excellent suggestions and points, ... Rick
>> Moen, particularly on looking at it from a bit more of network and
>> network connectivity perspective.
>>
>> I'll also mention, with modern webservers and virtual name hosting and
>> SNI, etc., hitting an http or https URL by IP address, rather than
>> hostname, may not get same results, even if the site is working
>> perfectly and as expected.  However in both cases, one should at least
>> still be able to connect to the site (open TCP connection to the IP
>> address and port - by name or IP address).
>>
>> I'd also be very interested in seeing
>> tcpdump capture of UDP and TCP traffic to port 53,
>> while strace is run on each of these commands:
>> $ curl -I https://github.com/
>> $ ping github.com
>> ... each captured and saved separately for those two commands.
>> strace output can be voluminous,
>> so too for tcpdump, but limiting the capture to DNS traffic and for the
>> short duration of running such command, should be relatively little
>> tcpdump data to capture.
>> On tcpdump, a few of the options I'd recommend,
>> on Linux, can do:
>> -i any
>> that can be very handy if one isn't sure which interface(s) will be
>> used.
>> -s 0
>> -n
>> -p
>> -w tcpdump.cap
>> The above to save the raw capture - can then examine it with tcpdump
>> (-r) or wireshark or whatever - use suitably named files for each
>> separate capture (or rename them after completing capture).
>> I think you can guess what options I commonly use on strace:
>> $ cat ~/bin/Strace
>> #!/bin/sh
>> exec strace -fv -eall -s2048 ${1+"$@"}
>> The -o option can also be handy to save to a file.
>> And yes, I didn't detail all the various options I mentioned,
>> man(1) is very good at covering those.  :-)
>>
>> Oh, and of course also, good to keep a log of what you observe, try,
>> change, etc.
>>
>> So, yes, I find it odd that ping works, but curl fails with an apparent
>> resolver error - how might it be that it seems one resolves, but the
>> other doesn't?  That does seem quite odd.  But there's an answer down
>> there somewhere.
>>
>>
>>> From: "Alex Kleider" <akleider at sonic.net>
>>> Subject: Re: [sf-lug] resolver problem
>>> Date: Thu, 07 Apr 2016 08:53:04 -0700
>>
>>> On 2016-04-07 00:26, Michael Paoli wrote:
>>>> A few more divide & conquer things you could try ...
>>>>
>>>> So ... works with one browser, not another, works with ping,
>>>> but not git?  Efficient divide & conquer typically involves
>>>> devising series of sufficiently easy/feasible tests, that
>>>> effectively divide possible causes into two sets of
>>>> roughly approximate probability, and results of test
>>>> rule one set in, and the other out - or at least
>>>> significantly assist in determining probability - after test -
>>>> between the two sets.  One can also think of whittling it down
>>>> to the simplest test that can show the "works" vs. "doesn't work" -
>>>> at which point it's generally obvious what the problem is, or at least
>>>> exactly where the problem is.
>>>>
>>>> So ... "browsers" ... & http & https.
>>>>
>>>> What about wget and/or curl?
>>>>
>>>> E.g.:
>>>> $ curl -I https://www.google.com/
>>>> $ curl -I https://github.com/
>>>> and/or:
>>>> $ wget -q -O - https://www.google.com/ | head -c 256; echo
>>>> $ wget -q -O - https://github.com/ | head -c 256; echo
>>>>
>>>
>>> as per instructions, here are the facts:
>>> alex at x301:~$ curl -I https://google.com
>>> HTTP/1.1 301 Moved Permanently
>>> Location: https://www.google.com/
>>> Content-Type: text/html; charset=UTF-8
>>> Date: Thu, 07 Apr 2016 15:31:07 GMT
>>> Expires: Sat, 07 May 2016 15:31:07 GMT
>>> Cache-Control: public, max-age=2592000
>>> Server: gws
>>> Content-Length: 220
>>> X-XSS-Protection: 1; mode=block
>>> X-Frame-Options: SAMEORIGIN
>>> Alternate-Protocol: 443:quic
>>> Alt-Svc: quic=":443"; ma=2592000; v="32,31,30,29,28,27,26,25"
>>>
>>> alex at x301:~$ curl -I https://github.com
>>> curl: (6) Could not resolve host: github.com
>>>
>>> and
>>> alex at x301:~$ wget -q -O - https://www.google.com | head -c 256; echo
>>> <!doctype html><html itemscope=""  
>>> itemtype="http://schema.org/WebPage" lang="en"><head><meta  
>>> content="Search the world's information, including webpages,  
>>> images, videos and more. Google has many special features to help  
>>> you find exactly what you're looking
>>> alex at x301:~$ wget -q -O - https://github.com/ | head -c 256; echo
>>>
>>> alex at x301:~$
>>>
>>> Switching users: symptoms are the same:
>>> Chromium can resolve urls, Firefox can not, and
>>> pat at x301:~$ ifconfig
>>> eth2      Link encap:Ethernet  HWaddr 00:1c:25:9d:e1:f0
>>>         inet addr:10.0.0.16  Bcast:255.255.255.255  Mask:255.255.255.0
>>>         inet6 addr: fe80::21c:25ff:fe9d:e1f0/64 Scope:Link
>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>         RX packets:397957 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:343874 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:1000
>>>         RX bytes:61678643 (61.6 MB)  TX bytes:30890987 (30.8 MB)
>>>         Interrupt:20 Memory:f0600000-f0620000
>>>
>>> lo        Link encap:Local Loopback
>>>         inet addr:127.0.0.1  Mask:255.0.0.0
>>>         inet6 addr: ::1/128 Scope:Host
>>>         UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>>         RX packets:263600 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:263600 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:0
>>>         RX bytes:27988121 (27.9 MB)  TX bytes:27988121 (27.9 MB)
>>>
>>> tun0      Link encap:UNSPEC  HWaddr  
>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>         inet addr:172.31.2.14  P-t-P:172.31.2.13  Mask:255.255.255.255
>>>         UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
>>>         RX packets:149 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:100
>>>         RX bytes:17045 (17.0 KB)  TX bytes:6174 (6.1 KB)
>>>
>>> pat at x301:~$ ping github.com
>>> PING github.com (192.30.252.120) 56(84) bytes of data.
>>> 64 bytes from github.com (192.30.252.120): icmp_seq=1 ttl=52 time=88.4 ms
>>> 64 bytes from github.com (192.30.252.120): icmp_seq=2 ttl=52 time=87.2 ms
>>> 64 bytes from github.com (192.30.252.120): icmp_seq=3 ttl=52 time=88.4 ms
>>> ^C
>>> --- github.com ping statistics ---
>>> 4 packets transmitted, 3 received, 25% packet loss, time 2999ms
>>> rtt min/avg/max/mdev = 87.244/88.044/88.452/0.565 ms
>>> pat at x301:~$ git clone https://github.com/alexKleider/debk.git
>>> Cloning into 'debk'...
>>> fatal: unable to access  
>>> 'https://github.com/alexKleider/debk.git/': Could not resolve  
>>> host: github.com
>>> pat at x301:~$
>>>
>>>
>>> Does this get us any closer to the root problem?
>>> Alex





More information about the sf-lug mailing list