[sf-lug] resolver problem
Alex Kleider
akleider at sonic.net
Fri Apr 22 13:28:52 PDT 2016
Thanks, Michael, for you interest.
I've put in a brand new hard drive and reinstalled Ubuntu 14.04
... but I still have the 'problem' drive and hope some day to put it
back in
and systematically try to answer all the questions posed and present
them in
an orderly manner using the outline you suggested.
Right now we are travelling up the coast (currently on Vancouver Island)
so won't get to it for a while yet.
Cheers,
Alex
On 2016-04-17 13:17, Michael Paoli wrote:
> So, Alex, ... I think at least some of us are curious ...
>
> Did you get to the bottom of you "resolver" problem yet?
> Or otherwise solve/fix it?
> Or still working on it?
> Any noteworthy developments/findings?
>
>> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
>> To: "Alex Kleider" <akleider at sonic.net>
>> Cc: sf-lug at linuxmafia.com
>> Subject: Re: [sf-lug] resolver problem
>> Date: Fri, 08 Apr 2016 23:07:52 -0700
>
>> So ...
>>
>> What of telnet? Somewhere among the many postings, I said something
>> roughly along the lines of simplest example that can reproduce the
>> problem. Browser - especially of the GUI form, is quite large and
>> complex. curl - much better. Even better - wget. Yet better,
>> telnet, ... or trace bit better yet, nc.
>>
>> I made mention of strace ... Rick also likewise made mention of how it
>> can be overkill / sledgehammer ... tons of output, etc. But whittled
>> down to simplest (whatever) to trace that can show the problem, it
>> becomes quite a bit easier to work with. In the case of nc, using
>> strace on that ... I get less than 230 lines of data. That may still
>> seem like "a lot" - but that's before even filtering out
>> "uninteresting" stuff. But perhaps more importantly - you indicate it
>> works when you boot from the ISO image, but not when running off the
>> operating system installed on the host's drive. So ... capturing
>> similar from both - same command and all, one working, one not,
>> respectively booted from ISO, and host's drive - a side-by-side
>> comparison would likely be highly informative. Right around where
>> they
>> start to significantly differ is likely at or exceedingly close to
>> where our problem is. So, that could very substantially narrow it
>> down
>> quite quickly. And particularly where it works in both cases with
>> ping, but fails with, e.g. Firefox and curl - as was pointed out by
>> Rick, highly probable there's some common issue to both, that's
>> impacting resolution for many, but not all things/programs - even when
>> trying to resolve the same name.
>>
>> So, if you do:
>> $ </dev/null telnet github.com 80
>> Do you get something like:
>> Trying 192.30.252.123...
>> Connected to github.com.
>> Escape character is '^]'.
>> Connection closed by foreign host.
>> $
>> Or do you get something like a resolver error?
>> Or better yet, do you still get error resolving the name if you use
>> nc?
>> E.g.:
>> $ nc -z github.com 80
>> $ echo $?
>>
>> $
>> That nc works, resolves, and connect, as can be seen by its exit value
>> - or one could add the -v option. If I use a name that doesn't
>> resolve:
>> $ nc -z test. 80
>> nc: getaddrinfo: Name or service not known
>> $
>> (the test TLD is reserved, thus won't exist on The Internet
>> (I hear we're supposed to lowercase internet now ... whatever ...
>> but that does, however, introduce more ambiguity, as at least
>> historically, internet and The Internet have different meanings.)).
>> So ... if, presuming nc fails for you on the problematic environment,
>> notably how do the outputs of these:
>> $ 2>&1 strace -fv -eall -s2048 nc -z github.com 80 | cut -c-80
>> compare between that problematic environment,
>> and booted from ISO? - most notably where do they differ
>> significantly,
>> including context of, say, half dozen or a bit more lines before and
>> after?
>>
>> Oh, and you may want to capture the output without truncating to 80
>> character width as I show with cut(1) ... but may be easier to compare
>> - at least by eyeball, when they're trimmed to 80 characters (or even
>> less) - then much easier to eyeball 'em side-by-side ... or one can
>> rather effectively do that with suitable options to diff(1).
>>
>> And why do I ask for telnet or nc? Started futzing with strace,
>> looking
>> for simplest that may be able to capture the issue you're
>> experiencing.
>> I tried curl - nice that it has -I option, to just do headers, and not
>> follow redirects, etc. But I also found that curl does a fork (or at
>> least clone(2)) - unneeded complexity - wget avoids that - but doesn't
>> have quiet as easy and simple a way to not follow redirects, etc.
>> http
>> also simpler than https. But we don't even need HTTP negotiation, as
>> presumably the issue shows before even completing a TCP connect. So
>> telnet, or even nc, should quite suffice for that.
>>
>> And looking at strace output on nc vs. even telnet, slightly less
>> stuff
>> to wade through with nc - and telnet less than wget, and wget less
>> than
>> curl.
>>
>> So ... let's presume you also get the failure with nc (or if not that,
>> with telnet, or if not that, with wget, or if not that, with curl).
>> Though not the same network or hardware, I can boot DVD ISO pretty
>> conveniently ...
>> Ubuntu- 14.04.4 LTS "Trusty Tahr" - Release amd64 (20160217.1)
>> When I do the strace on nc, I get 228 lines of output (or 224 if the
>> lookup fails). If I use telnet (and discard telnet's stderr and
>> stdout
>> and redirect it's stdin from /dev/null) I get 238 lines of data from
>> strace:
>> $ </dev/null 2>&1 >>/dev/null strace -fv -eall -s2048 telnet \
>>> github.com 80 | cut -c-80 | wc -l
>> 238
>> $
>> In the above, leading "> " is shell's PS2 prompt. Similarly, with
>> wget,
>> I get more lines. And with curl, yet more.
>>
>> And, exactly what OS versions? Here's what I show booted from ISO:
>> $ lsb_release -d; uname -m; 2>&1 id | fold -s -w 72; tty; env | fgrep
>> LC
>> Description: Ubuntu 14.04.4 LTS
>> x86_64
>> uid=999(ubuntu) gid=999(ubuntu)
>> groups=999(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),108(lpa
>> dmin),124(sambashare)
>> /dev/ttyS0
>> LC_ALL=C
>> $
>> IIRC, you mentioned ubuntu 14.[0]4, but don't think you detailed more
>> specifically. I do have fair number of ISOs:
>> https://www.wiki.balug.org/wiki/doku.php?id=balug:cds_and_images_etc
>> But some are more conveniently at my fingertips:
>> $ (cd /var/tmp/ISOs && ls -d *.iso) | fgrep 14.04
>> kubuntu-14.04.2-desktop-amd64.iso
>> kubuntu-14.04.2-desktop-i386.iso
>> lubuntu-14.04.1-alternate-i386.iso
>> lubuntu-14.04.2-desktop-i386.iso
>> ubuntu-14.04.4-desktop-amd64.iso
>> ubuntu-14.04.4-desktop-i386.iso
>> ubuntu-14.04.4-server-amd64.iso
>>
>> Rick also made mention of ltrace - also excellent tool. Alas, I tend
>> to
>> underutilize it - probably mostly because I learned of and had been
>> using strace for quite a while before I even first heard of ltrace.
>> ltrace also may be quite useful (maybe even more useful?) in this
>> case,
>> but strace might be more useful, e.g. in knowing what resources are
>> accessed, or where access attempts are made, which attempts may fail
>> and/or give unexpected results.
>>
>> You could also do strace on ping, for comparison purposes, e.g.:
>> $ 2>&1 strace -fv -eall -s2048 ping -c 3 github.com | cut -c-80
>> That actually works and gives me even fewer lines than nc ...
>> but alas, also want capture of example that should work, but doesn't.
>> Anyway, working example of ping from OS on the host's drive, may also
>> be
>> useful in helping explain why ping works, but, e.g. curl fails ... at
>> least if we happen to also be curious about that.
>>
>> Oh, strace does also have handy -o option to save its output to a
>> file.
>> May want to do that, and can always do other stuff after capture to
>> compare, etc.
>>
>> Oh, also, point was made about possible drive error/corruption.
>> That's
>> one of the reasons I earlier mentioned looking at logs - e.g. in case
>> there are hardware errors, or other funkiness going on ... logs and/or
>> dmesg may show such errors, hence I earlier mentioned those.
>>
>>
>>> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
>>> Subject: Re: [sf-lug] resolver problem
>>> Date: Fri, 08 Apr 2016 01:55:52 -0700
>>
>>> Curiouser and curiouser! ;-)
>>>
>>> Well, I still find it quite puzzling, that, e.g.:
>>> $ ping github.com
>>> works, whereas:
>>> $ curl -I https://github.com
>>> fails with:
>>> curl: (6) Could not resolve host: github.com
>>> That particular combination is rather bizarre.
>>>
>>> Nevertheless, computers quite logical things - they do exactly what
>>> they're told (well, notwithstanding some hardware failures and such,
>>> but
>>> even in such case, they still follow the logic of their programming
>>> and
>>> electronics - whatever it dictates from whatever state it's in and
>>> goes
>>> through).
>>>
>>> On with divide and conquer. :-) There's an answer down there
>>> *somewhere*.
>>>
>>> So, I might suggest, to also make it a bit easier for those
>>> attempting
>>> to follow along and isolate cause of the issue(s)/problem(s), perhaps
>>> keep a high-level summary table, and also a section on particular
>>> diagnostic commands and their output (or relevant portions thereof).
>>> E.g.:
>>>
>>> results operation/command
>>> ------------------------------------------------------------------------
>>> Sever not found Firefox https://github.com/
>>> Could not resolve curl -I https://github.com/
>>> workaround for above when adding IP of github.com to /etc/hosts
>>> ok Chromium https://github.com/
>>> ok ping github.com
>>> same as above different user, same host
>>> all above ok when booted from Ubuntu 14.04[.x] DVD
>>>
>>> diagnostics/commands/output (or selected portions thereof):
>>>
>>> $ lsb_release -d || cat /etc/os-release
>>> ...
>>> $ uname -m
>>> ...
>>> $ cat /etc/resolv.conf
>>> ...
>>> $ fgrep hosts /etc/nsswitch.conf
>>> ...
>>> $ ip -4 a s
>>> ...
>>> $ ip -6 a s
>>> ...
>>> $ ip -4 r s
>>> ...
>>> $ ip -6 r s
>>> ...
>>> (if ip isn't on one's PATH, one can give full pathname, e.g.
>>> /sbin/ip)
>>> It would be good to also show the output of the above commands when
>>> booted from the Ubuntu DVD. (you could save them to another system or
>>> some writable USB storage, etc.).
>>>
>>> Might also want to occasionally tag that saved output with timestamps
>>> -
>>> that may be particularly valuable if at some point things are found
>>> to
>>> have changed in their behavior. E.g. occasionally use:
>>> $ date -Iseconds
>>> (yes, I know, the -I option is deprecated, but it is so dang handy).
>>>
>>> Many have made other good/excellent suggestions and points, ... Rick
>>> Moen, particularly on looking at it from a bit more of network and
>>> network connectivity perspective.
>>>
>>> I'll also mention, with modern webservers and virtual name hosting
>>> and
>>> SNI, etc., hitting an http or https URL by IP address, rather than
>>> hostname, may not get same results, even if the site is working
>>> perfectly and as expected. However in both cases, one should at
>>> least
>>> still be able to connect to the site (open TCP connection to the IP
>>> address and port - by name or IP address).
>>>
>>> I'd also be very interested in seeing
>>> tcpdump capture of UDP and TCP traffic to port 53,
>>> while strace is run on each of these commands:
>>> $ curl -I https://github.com/
>>> $ ping github.com
>>> ... each captured and saved separately for those two commands.
>>> strace output can be voluminous,
>>> so too for tcpdump, but limiting the capture to DNS traffic and for
>>> the
>>> short duration of running such command, should be relatively little
>>> tcpdump data to capture.
>>> On tcpdump, a few of the options I'd recommend,
>>> on Linux, can do:
>>> -i any
>>> that can be very handy if one isn't sure which interface(s) will be
>>> used.
>>> -s 0
>>> -n
>>> -p
>>> -w tcpdump.cap
>>> The above to save the raw capture - can then examine it with tcpdump
>>> (-r) or wireshark or whatever - use suitably named files for each
>>> separate capture (or rename them after completing capture).
>>> I think you can guess what options I commonly use on strace:
>>> $ cat ~/bin/Strace
>>> #!/bin/sh
>>> exec strace -fv -eall -s2048 ${1+"$@"}
>>> The -o option can also be handy to save to a file.
>>> And yes, I didn't detail all the various options I mentioned,
>>> man(1) is very good at covering those. :-)
>>>
>>> Oh, and of course also, good to keep a log of what you observe, try,
>>> change, etc.
>>>
>>> So, yes, I find it odd that ping works, but curl fails with an
>>> apparent
>>> resolver error - how might it be that it seems one resolves, but the
>>> other doesn't? That does seem quite odd. But there's an answer down
>>> there somewhere.
>>>
>>>
>>>> From: "Alex Kleider" <akleider at sonic.net>
>>>> Subject: Re: [sf-lug] resolver problem
>>>> Date: Thu, 07 Apr 2016 08:53:04 -0700
>>>
>>>> On 2016-04-07 00:26, Michael Paoli wrote:
>>>>> A few more divide & conquer things you could try ...
>>>>>
>>>>> So ... works with one browser, not another, works with ping,
>>>>> but not git? Efficient divide & conquer typically involves
>>>>> devising series of sufficiently easy/feasible tests, that
>>>>> effectively divide possible causes into two sets of
>>>>> roughly approximate probability, and results of test
>>>>> rule one set in, and the other out - or at least
>>>>> significantly assist in determining probability - after test -
>>>>> between the two sets. One can also think of whittling it down
>>>>> to the simplest test that can show the "works" vs. "doesn't work" -
>>>>> at which point it's generally obvious what the problem is, or at
>>>>> least
>>>>> exactly where the problem is.
>>>>>
>>>>> So ... "browsers" ... & http & https.
>>>>>
>>>>> What about wget and/or curl?
>>>>>
>>>>> E.g.:
>>>>> $ curl -I https://www.google.com/
>>>>> $ curl -I https://github.com/
>>>>> and/or:
>>>>> $ wget -q -O - https://www.google.com/ | head -c 256; echo
>>>>> $ wget -q -O - https://github.com/ | head -c 256; echo
>>>>>
>>>>
>>>> as per instructions, here are the facts:
>>>> alex at x301:~$ curl -I https://google.com
>>>> HTTP/1.1 301 Moved Permanently
>>>> Location: https://www.google.com/
>>>> Content-Type: text/html; charset=UTF-8
>>>> Date: Thu, 07 Apr 2016 15:31:07 GMT
>>>> Expires: Sat, 07 May 2016 15:31:07 GMT
>>>> Cache-Control: public, max-age=2592000
>>>> Server: gws
>>>> Content-Length: 220
>>>> X-XSS-Protection: 1; mode=block
>>>> X-Frame-Options: SAMEORIGIN
>>>> Alternate-Protocol: 443:quic
>>>> Alt-Svc: quic=":443"; ma=2592000; v="32,31,30,29,28,27,26,25"
>>>>
>>>> alex at x301:~$ curl -I https://github.com
>>>> curl: (6) Could not resolve host: github.com
>>>>
>>>> and
>>>> alex at x301:~$ wget -q -O - https://www.google.com | head -c 256; echo
>>>> <!doctype html><html itemscope=""
>>>> itemtype="http://schema.org/WebPage" lang="en"><head><meta
>>>> content="Search the world's information, including webpages,
>>>> images, videos and more. Google has many special features to help
>>>> you find exactly what you're looking
>>>> alex at x301:~$ wget -q -O - https://github.com/ | head -c 256; echo
>>>>
>>>> alex at x301:~$
>>>>
>>>> Switching users: symptoms are the same:
>>>> Chromium can resolve urls, Firefox can not, and
>>>> pat at x301:~$ ifconfig
>>>> eth2 Link encap:Ethernet HWaddr 00:1c:25:9d:e1:f0
>>>> inet addr:10.0.0.16 Bcast:255.255.255.255
>>>> Mask:255.255.255.0
>>>> inet6 addr: fe80::21c:25ff:fe9d:e1f0/64 Scope:Link
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> RX packets:397957 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:343874 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:61678643 (61.6 MB) TX bytes:30890987 (30.8 MB)
>>>> Interrupt:20 Memory:f0600000-f0620000
>>>>
>>>> lo Link encap:Local Loopback
>>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>>> inet6 addr: ::1/128 Scope:Host
>>>> UP LOOPBACK RUNNING MTU:65536 Metric:1
>>>> RX packets:263600 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:263600 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:0
>>>> RX bytes:27988121 (27.9 MB) TX bytes:27988121 (27.9 MB)
>>>>
>>>> tun0 Link encap:UNSPEC HWaddr
>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>> inet addr:172.31.2.14 P-t-P:172.31.2.13
>>>> Mask:255.255.255.255
>>>> UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
>>>> RX packets:149 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:100
>>>> RX bytes:17045 (17.0 KB) TX bytes:6174 (6.1 KB)
>>>>
>>>> pat at x301:~$ ping github.com
>>>> PING github.com (192.30.252.120) 56(84) bytes of data.
>>>> 64 bytes from github.com (192.30.252.120): icmp_seq=1 ttl=52
>>>> time=88.4 ms
>>>> 64 bytes from github.com (192.30.252.120): icmp_seq=2 ttl=52
>>>> time=87.2 ms
>>>> 64 bytes from github.com (192.30.252.120): icmp_seq=3 ttl=52
>>>> time=88.4 ms
>>>> ^C
>>>> --- github.com ping statistics ---
>>>> 4 packets transmitted, 3 received, 25% packet loss, time 2999ms
>>>> rtt min/avg/max/mdev = 87.244/88.044/88.452/0.565 ms
>>>> pat at x301:~$ git clone https://github.com/alexKleider/debk.git
>>>> Cloning into 'debk'...
>>>> fatal: unable to access 'https://github.com/alexKleider/debk.git/':
>>>> Could not resolve host: github.com
>>>> pat at x301:~$
>>>>
>>>>
>>>> Does this get us any closer to the root problem?
>>>> Alex
More information about the sf-lug
mailing list