[sf-lug] SF-LUG.COM. DNS & what happened in 2009-02
Michael Paoli
Michael.Paoli at cal.berkeley.edu
Tue Nov 17 00:11:59 PST 2009
For those that might not have seen it earlier, or might want a
refresher, here's information on a relatively similar problem
from earlier this year. Scenario was a bit different then,
but has quite a bit in common with the current situation.
For the details from earlier, have a read/skim through:
http://linuxmafia.com/pipermail/sf-lug/2009q1/006424.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006426.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006429.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006430.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006431.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006432.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006437.html
http://linuxmafia.com/pipermail/sf-lug/2009q1/006446.html
And for the stuff referencing either of these URLs:
http://208.96.15.252/log.txt
http://www.sf-lug.com/log.txt
just have a look at that earlier bit of data, reproduced below.
2009-02-27
Sometime today, folks started noticing problems with DNS for sf-lug.com., e.g:
http://linuxmafia.com/pipermail/sf-lug/2009q1/006424.html
et. seq.
2009-02-28 mpaoli
I noticed the DNS problems with sf-lug.com., and also found
http://linuxmafia.com/pipermail/sf-lug/2009q1/006424.html
et. seq.
# fuser -n tcp 53
here: 53
53/tcp: 3319
# ps lwwwwwwwwwp 3319
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 10741 3319 1 19 0 47800 2712 rt_sig Ssl ? 0:00
/usr/local/sbin/named-balug -u balugdns -c /etc/named-balug.conf -t
/var/named/chroot-balug
... but BALUG DNS usually only listens on a different IP (208.96.15.254) -
for sf-lug.com. we're interested in: 208.96.15.252
# netstat -an | grep ':53 .*LISTEN'
tcp 0 0 208.96.15.254:53 0.0.0.0:*
LISTEN
... so, nothing listening (sf-lug.com. DNS down) on 208.96.15.252 port 53
# uptime
04:17:54 up 14 days, 10:36, 9 users, load average: 0.00, 0.00, 0.00
Gee, I wonder if bind didn't restart and if the zone had an expire of 14 days?
That would explain a lot.
confirm *nix flavor
# cat /etc/redhat-release
CentOS release 4.4 (Final)
# ls /etc/init.d/*named*
/etc/init.d/named /etc/init.d/named-balug
# chkconfig --list | fgrep named | fgrep -v balug
named 0:off 1:off 2:off 3:off 4:off 5:off 6:off
... that explains a lot ...
# ls -ld /etc/*named*conf*
lrwxrwxrwx 1 root root 44 May 12 2007 /etc/named-balug.conf ->
/var/named/chroot-balug/etc/named-balug.conf
lrwxrwxrwx 1 root root 32 Mar 5 2007 /etc/named.conf ->
/var/named/chroot/etc/named.conf
# rpm -qa | fgrep -i bind
bind-libs-9.2.4-24.EL4
bind-9.2.4-24.EL4
bind-chroot-9.2.4-24.EL4
bind-utils-9.2.4-24.EL4
ypbind-1.17.2-8
# ls -ld /var/named/chroot/etc/named.conf
-rw-r--r-- 1 root named 1853 May 14 2007 /var/named/chroot/etc/named.conf
# ls -ld /var/named/chroot/var/named/sf-lug.com
-rw-r--r-- 1 root root 440 Oct 29 2007
/var/named/chroot/var/named/sf-lug.com
# ls -ldu /var/named/chroot/var/named/sf-lug.com
-rw-r--r-- 1 root root 440 Nov 8 2007
/var/named/chroot/var/named/sf-lug.com
# (cd /var/named/chroot/var/named && df -k .)
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 9612516 3450096 5674128 38% /
# mount | fgrep md0
/dev/md0 on / type ext3 (rw)
#
Checked that the filesystem containing the sf-lug.com zone file that's
presumably the one we're interested in isn't mounted ro or noatime (otherwise
the atime of the file wouldn't be too useful/informative in this case).
... looks like bind hasn't (re)read that file in quite a while. I'm
probably looking at correct file - but haven't confirmed the init config
bits to see if it's using that chroot location ... though it likely is.
Let's see if relevant restart fixes it and confirms all those bits,
but first ...
# head /var/named/chroot/var/named/sf-lug.com
$TTL 86400
$ORIGIN sf-lug.COM.
@ IN SOA ns1.sf-lug.com. jim.well.com. (
2007102904 ;Serial
3600 ;refresh period
3600 ;retry period
1209600 ;expire period
10800) ;minimum TTL period
;
IN NS ns1.sf-lug.com.
# echo '1209600/3600/24' | bc -l
14.00000000000000000000
#
Yup, ... 14 day expiration, as I suspected.
# (umask 022 && chkconfig named on)
... umask 022 - I don't trust Red Hat (and thus CentOS) quite enough for it to
always do the right thing ... so ... 022 for something that may
install/modify,
and where I don't want the permissions to end up too tight where such isn't
desired.
# chkconfig --list | fgrep named | fgrep -v balug
named 0:off 1:off 2:on 3:on 4:on 5:on 6:off
# (cd / && umask 022 && service named start)
Starting named: [ OK ]
# netstat -an | grep ':53 .*LISTEN'
tcp 0 0 208.96.15.252:53 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN
tcp 0 0 208.96.15.254:53 0.0.0.0:*
LISTEN
# ls -ldu /var/named/chroot/var/named/sf-lug.com
-rw-r--r-- 1 root root 440 Feb 28 04:30
/var/named/chroot/var/named/sf-lug.com
# date
Sat Feb 28 04:30:23 PST 2009
That looks much better ... and nice fresh access time (from (re)start and
hence (re)read ... and more recent than when I otherwise read the file - so
likely I was looking at the correct zone file). And the acid test ... does it
work? From elsewhere on the Internet:
$ dig @208.96.15.252 -t A sf-lug.com. +short
208.96.15.252
$ dig @208.96.15.252 -t A sf-lug.com. +short +tcp
208.96.15.252
$
Looks good!
... from earlier peek at SOA, we have 3600 for retry ... so, at worst case,
slave should be all better within an hour.
... and already, slave looks good:
$ dig @198.144.195.186 -t A sf-lug.com. +short
208.96.15.252
$ dig @198.144.195.186 -t A sf-lug.com. +short +tcp
208.96.15.252
$
... likely from BIND >=8 "notify"
... peeking again at the named.conf file, and the zone file, we see the slave
listed as an NS for the zone, and we find nothing in the named.conf that
would prevent BIND from sending notify to the slave, so the master likely
did so, and thus the slave would have recovered much more quickly.
More information about the sf-lug
mailing list