[sf-lug] Ubuntu connectivity problem(s)
Rick Moen
rick at linuxmafia.com
Thu Apr 7 16:39:50 PDT 2016
Quoting Alex Kleider (akleider at sonic.net):
> Using Ubuntu 14.04:
> During boot up the usual Ubuntu logo appears with the blinking dots
> beneath them but in addition a new message has started to appear:
> first it's: Waiting for network configuration...
> then changes to: Waiting up to 60 more seconds for network
> configuration ...
> and finally: Booting without network configuration.
>
> Once booted:
[Snip /sbin/ifconfig output showing no IP assignments]
OK, forgive me if I'm being really dense, which certainly is far more
common than I like to admit, but is the problem 'Ubuntu machine's
wired and/or wireless network interfaces don't get IPs'?
Again, I may be utterly confused, but I'm still not totally sure what
the problem to be solved _is_. (But, don't fret, I'm going with 'yes'
to the above answer.)
Further down, you make reference to some kinda-of unspecified
'gateway/router/access_point'. Am I correctly guessing that you
expect host x301 to successfully get DHCP leases from the
'gateway/router/access_point' on an ethernet port or a wireless port or
both? I mean, what was the normal state of affairs that is now not
normal and is to be explored and changed or accounted for?
Below the /sbin/ifconfig output you speak of the laptop 'functioning fine
for several months until these recent symptoms began': Does 'these
symptoms' refer to '(a) ethernet port doesn't get DHCP lease', or (b)
'wireless port doesn't get DHCP lease', or (c) both, or (d) something
else?
Back in dinosaur days, host interfaces on a machine with a TCP/IP stack
had IP addresses because they were assigned locally, i.e., as static IP
assignments. Later, DHCP became so prevalent that many people assume
DHCP is like the availability of oxygen, and don't bother to think how
it works or consider the possibility of it failing because the _source_
had a problem -- or the cabling, or the wireless transport, or something
else. But that's the problem with hidden complexity that you cease to
look at: The day it misbehaves, you suddenly have a complex diagnostic
landscape.
Not intending to complain, but all you've said is there's a a Lenovo
ThinkPad model X301 that by some unspecified jiggery-pokery you expect
to do networky things, and somewhere in the picture there's a vaguely
referenced 'gateway/router/access_point'. One hopes there's also,
y'know, ethernet cables, configured state in the wireless bridge
(probably not technically a router) that is sufficient to run a working
ESSID, and other real-world necessities.
DHCP isn't magic-wand material. The DHCP server device must be turned
out, configured, booted and functional. It must be issuing DHCP offers
over functional transport (wired or not) when it receives DHCP requests.
The client devices must send out those DHCP request broadcasts, over
likewise functional transport (wired or not). The functionality on both
ends relies on both hardware and software working right. The
functionality in the middle relies on the physical layer (e.g., the
ethernet cable or wireless transport, etc.) working right.
In that regard, this thing you say is incredibly useful:
> If booted up from a 'live CD' all works as expected- no connectivity
> problems at all: wifi works fine.
Hallelujah -- and congratulations on doing an exceptionally intelligent
and useful thing: cross-checking using a live CD. As I said,
functional DHCP requires both working hardware and working software on
each end. Your live-CD crosscheck, in one stroke, cleared away a vast
number of unknowns.
That is, since you say network connectivity reliably, consistently Does
the Right Thing when using a live CD, we can now absolve the DHCP
server's hardware and software, and the transport between the two
devices, and the Lenovo ThinkPad x301's hardware. The source of your
problem has been isolated to something going wacky with your Ubuntu
14.04 OS load.
One minor thing bothers me in the above: You say 'wifi works fine':
What about the ethernet? Do we not care about it? Is it unused and not
cabled, hence being ignored in the current problem posed?
I ask not to be a pain in the tuchis but rather because long experience
has made me very wary of solving the wrong problem. I'd not want to
worry about the ethernet non-IPing and then, much later, you say 'Oh,
I'm not using ethernet' -- or, alternatively, ignoring ethernet and
later you say 'That's nice that we've solved the wifi problem, but what
about my dodgy ethernet service?'
My _guess_ is that you intended to say 'Oh, this is solely about wifi.
Ethernet works fine.' Or: "Oh, this is solely about wifi. I have
absolutely no bloody idea whether wired TCP/IP to my
gateway/router/access_point works, because I've never tried it and don't
use it.'
If the latter, you might want to switch gears for a moment and _test_
wired ethernet, even if you ordinarily wouldn't use it. Why? Because,
although your live-CD test commendably isolated the problem to your
laptop's Ubuntu 14.04 OS load, at this point it's unclear whether the
problem is further isolated to wifi only, or to the TCP/IP stack as a
whole, or what. If you can never get a DHCP lease over wired ethernet
using the loaded Ubuntu 14.04 OS, but always get one over wired ethernet
using a live CD, then that tells you you have a systemwide TCP/IP
problem in the loaded OS. In the alternative, if DHCP over wired
ethernet always works, then you have an Ubuntu-specific wifi problem.
[You took down and brought back up the wlan3 interface using ifdown and
ifup:]
> eth2:avahi Link encap:Ethernet HWaddr 00:1c:25:9d:e1:f0
> inet addr:169.254.11.114 Bcast:169.254.255.255 Mask:255.255.0.0
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> Interrupt:20 Memory:f0600000-f0620000
and
> wlan3:avahi Link encap:Ethernet HWaddr 00:21:6b:a1:81:44
> inet addr:169.254.7.51 Bcast:169.254.255.255 Mask:255.255.0.0
> UP BROADCAST MULTICAST MTU:1500 Metric:1
Personally, I seriously, absolutely do not trust ZeroConf networking, as
it's flaky and introduces pointless complexity and an avoidable random
factor into networks. So, me, I'd have hastened to shut off the damned
avahi daemon (implementation of IETF ZeroConf).
That being said, above snippet reveals that ZeroConf autoassigned IPs
to eth2 and wlan3.
> Somehow avahi has gotten involved and wlan3:avahi has an ip address
> but not one compatable with my 10.0.0.0/24 lan network (where the
> gateway/router/access_point is at 10.0.0.2.)
Yeah, well, goddamned ZeroConf is a world unto itself. That's not
DHCP-assigned; it's auto-assigned. I mean that literally: The
ZeroConf-infested (er, -enabled) interface just picked that.
The idea behind ZeroConf is to allow IP autoassignment and automatic
device discovery. Like, imagine you have a network printer on your home
LAN (whatever your home LAN is; might be wireless). Imagine that it
arrived out of the box with ZeroConf enabled. In the happy shiny world
of ZeroConf, when you connect your laptop to the home LAN, if it _too_
has ZeroConf enabled on one or more network interface, it sends out
ZeroConf auto-negotiates its own IP on each such interface, then sends
out copious ZeroConf announcements over the network transport(s) saying
'Hi, I'm a lonely ZeroConf device. Are there any friendly other
ZeroConf devices within range?' The printer talks back, they fall in
love, and your laptop (at least semi-) autoconfigures to talk to the
printer's queue and send whatever is the needed print language.
And, in practice, it's flaky as all hell.
At my house, my wife deployed a Lexmark laser printer on the home LAN,
and she and her mother both took the path of least resistance in setting
up printing from their laptops, defaulting to ZeroConf transport to it.
Me, I don't like automagical network configuration. I want devices to
do exactly the network operations they've been configured to do. So, I
did this:
1. Visit printer. Make it print out a configuration sheet including its
IP address, just to make sure you know what it is.
2. Open Web browser to http://127.0.0.1:631 , to configure CUPS on your
computer. Set up a semi-old-school IPP-type print object, like
ipp://10.0.1.7/lexx -- typing in the previously determined printer IP
rather than automagically avoiding the need to do so. Configure the
desired printer type (Lexmark e250dn), in order to ensure appropriate
choice of printer language.
3. Have CUPS send a test page to check results.
Many months later, both Deirdre and her mother Cheryl suffered sudden
and mysterious ability to print, while I continued to have no problem.
The difference? I was using old-school IPP protocol and not using
device autodiscovery. They were both defaulting to ZeroConf by
following the path of least resistance and enjoying autodiscovery, which
seemed like a good idea until it suddenly mysteriously didn't work.
But I digress. Your ZeroConf-derived network activity is, seems to me,
a small circus adding needless confusion. Make up your own mind what to
do. Me, I'd sh**can it.
Maybe the ZeroConf circus is complicating and interfering with what
_ought_ be happening with DHCP from your 'gateway/router/access_point'.
Maybe even Chromium is able to take advantage of that somehow, though I
have no idea how. Or maybe it's irrelevant.
Personally, I just turn off the damned stuff, and accordingly my
knowledge of its routine 'normal' (heh!) operation is necessarily a
little sketchy.
> I then launch Firefox and put in my email url
> https://webmail.sonic.net/
> and get a "Firefox can't find the server at webmail.sonic.net." message.
Divide the problem into two parts. Firefox needs to first resolve FQDN
'webmail.sonic.net' to an IP address using the system resolver library,
and then needs to open a socket to 443/tcp (HTTPS) on that IP address.
So, test the second half separately by asking Firefox to open
an HTTPS connection to 443/tcp on the IP address. What IP address?
Currently this one:
$ host webmail.sonic.net
webmail.sonic.net has address 69.12.208.39
$
So, try URL https://69.12.208.39/
This is kinda 'TCP/IP 101', Alex. If the whole megillah (of a network
service) doesn't work including DNS resolution, try without needing DNS
resolution first, to cut the problem space in two.
If Firefox is able to load https://69.12.208.39/ , then it's a problem
with Firefox's access to DNS. If not, it's a problem with Firefox's
access to lower-level TCP/IP functions. That's an example of what we
(in diagnostic circles) mean when we say 'Divide the problem into
smaller pieces.'
I'm guessing you'll find that it's not a problem with Firefox's ability
to reach remote TCP sockets; it's a problem with its access to DNS
resolution.
> Now when using the command line: a git push fails although ping finds
> the IP address without problems:
Again, this strongly suggests 'git push' is able to reach the remote
socket over the implied network transport (HTTPS in this case, I guess),
but that transport's access to DNS resolution is broken.
What to do, to get out of this situation? I'm strenuously avoiding
saying something snide about Canonical, Ltd. quality control and
testing. ;-> Short of making irritating distro comments and
suggestions, I hope the above is useful.
If the live CD you used for testing is _also_ Ubuntu 14.04, then that
would, I think, be particularly revealing, as it would suggest that
there's nothing _generically_ wrong with Ubuntu 14.04's networking, and
there's merely something b0rked in the local installation or its
conffiles. The results you related with an unnamed live CD distro
already point towards that likelihood.
More information about the sf-lug
mailing list