[sf-lug] [mini-]monitoring
Michael Paoli
michael.paoli at berkeley.edu
Sun Dec 7 17:24:34 PST 2025
Sometimes you don't want some complex monitoring architecture or
program, but something quite simple, e.g. just ping it occasionally but
regularly, and if it fails to ping, start alerting. And that's almost
as simple as a one-liner. That's pretty much what I had and was using,
a pretty trivial program.
But sometimes that's a bit annoying, and you want a wee bit better, but
still nothing horribly sophisticated or complex. So, yeah that alerting
can be annoying, particularly for (semi-)regular single event short
glitches (networking?), and cases where it self-heals and recovers again
in short order. So, a wee bit more coding and improved and have, with
bit of example use:
$ lping
2025-12-06T09:12:16Z 1765011676 UP
2025-12-06T14:12:40Z 1765031380 DOWN
2025-12-06T14:12:50Z 1765031390 UP
2025-12-07T04:12:04Z 1765080844 DOWN
2025-12-07T04:12:17Z 1765080857 DOWN 13s since 2025-12-07T04:12:04Z
2025-12-07T04:12:27Z 1765080867 DOWN 23s since 2025-12-07T04:12:04Z
2025-12-07T04:12:40Z 1765080880 DOWN 36s since 2025-12-07T04:12:04Z
2025-12-07T04:12:53Z 1765080893 DOWN 49s since 2025-12-07T04:12:04Z
2025-12-07T04:12:06Z 1765080906 DOWN 62s since 2025-12-07T04:12:04Z
2025-12-07T04:12:17Z 1765080917 DOWN 73s since 2025-12-07T04:12:04Z
2025-12-07T04:12:29Z 1765080929 DOWN 85s since 2025-12-07T04:12:04Z
2025-12-07T04:12:39Z 1765080939 DOWN 95s since 2025-12-07T04:12:04Z
2025-12-07T04:12:49Z 1765080949 UP
2025-12-07T15:12:30Z 1765121670 DOWN
2025-12-07T15:12:40Z 1765121680 UP
And, as configured, if it remains down for more than 10 minutes, it adds
and audible alert to those DOWN messages. Also added capability to shut
it up for longer on a temporary basis. E.g. Comcast Business fscks up
again, and things are out for longer, and expected to be out a while, or
otherwise just want it to remain quiet longer until things are up or
it's still down that much further into the future, a capability for
that. SIGINT - e.g. ^C from the keyboard, and each such, while in DOWN
state shuts up the audible alerting for an additional 10 minutes - and
each time it reports on that additional delay too - so if one wants to
shut it up for, say about 2 hours, can repeatedly hit ^C until it
displays the desired indicated delay - and well know one hit the right
number of ^Cs to get it to that point.
And also added handling of SIGHUP to have it reread its configuration,
because why not? :-)
I might tweak it further, but already enjoying the enhanced
functionality. As one can see in the example, displayed some of those
DOWN bits, so, aware of that - if I bother to look. But each was
sufficiently short, no need for it to bother me with audible alerting,
so ... it didn't, and remained quiet.
Yeah, I typically use that to alert me when linuxmafia.com is down, as,
among other things, it also hosts the SF-LUG list.
Yep, "just" a shell script, but program does its intended job quite well
enough.
Anyway, below, links to the current code and config file, and the then
code itself (reformatted slightly for email, and showing that).
https://www.mpaoli.net/~michael/bin/lping
https://www.mpaoli.net/~michael/.lpingrc
$ expand -t 2 < ~/bin/lping
#!/bin/sh
# vi(1) se tabstop=2
# read in our configuration:
. ~/.lpingrc || exit
ostate= # old (prior) state (initially not known)
# 0 up/nominal
# 1 down / check failed
# null for unknown / not yet known
# Check
ck(){
ping -n -c 1 "$IP" >>/dev/null
}
# sleep(1) for some bit (delay)
Sleep(){
sleep "$sleeps"
}
tT(){
# set t to seconds since epoch
# set T to human readable current UTC (Z) time
eval $(TZ=GMT0 date +"t='%s' T='%Y-%m-%dT%H:%m:%SZ'")
}
# SIGHUP to reread our config
trap '
. ~/.lpingrc || exit
' 1
hush= # silence alert up to this (epoch) time
# SIGINT to set/extend hush:
trap '
case "$hush" in
"")
hush=$(($(date +%s) + hushinc))
;;
*)
hush=$((hush + hushinc))
;;
esac
printf '\''%s\n'\'' \
"hushed until $(TZ=GMT0 date +%Y-%m-%dT%H:%m:%SZ -d @$hush)"
' 2
while :
do
if ck; then
state=0 # up/nominal
hush= # (re)set
case "$ostate" in
0)
# remains up
: # nothing to report/do
;;
*)
# down --> up or initial up
tT
printf '%s\n' "$T $t UP"
;;
esac
else
state=1 # down / check failed
case "$ostate" in
1)
# remains down
tT
s=$((t - ds)) # seconds down
[ -z "$hush" ] || [ "$hush" -gt "$t" ] || hush= # reset expired
if \
[ "$s" -lt "$alerts" ] ||
[ -n "$hush" ]
then
ALERT=
else
ALERT="$alert"
fi
printf '%s\n' "$ALERT$T $t DOWN ${s}s since $Ds"
;;
*)
# up --> down or initial down
tT # down since:
ds="$t"
Ds="$T"
printf '%s\n' "$T $t DOWN"
;;
esac
fi
ostate="$state"
Sleep
done
$
More information about the sf-lug
mailing list