[sf-lug] SF-LUG (& BALUG) VM infrastructure, etc. [Re: "all better" (up again): Re: (temporarily) down: SF-LUG (but lists up) ... likewise some BALUG bits down ...]

Tue Apr 19 00:20:44 PDT 2016

Sure, no problem, you're welcome.

And switching lists here - guestimating probably of more interest to
SF-LUGers, as those on balug-admin at lists.balug.org are likely already
rather familiar with most of this (much of it's also been discussed at
some BALUG meetings).

So, yep, I'm not a highly available colo data center setup at my
residence.  Nevertheless *fairly* available - not exactly "high"
availability, but mostly pretty available and reasonably solid (and
backed up, etc., etc.).

So, yep, power goes out, site(s) off-line - don't have UPS for DSL
modem, Ethernet Switch, etc.  Laptop runs a while on battery, but not
all that long (I tend to squeeze a fair amount of work out of my
laptops - often working 'em as rather both laptop *and* server) ... so
... typically even with a brand new fully charged battery, it'll
typically only run roughly about half as long as what typical
manufacturer specs say - as those are generally more based on "typical"
average consumer usage.  Have had UPSes before (and do rather quite
like 'em - especially when done "right" - for certain definitions of
right :-)), but don't have such a set-up presently (probably will again
eventually, but not in the short to medium-term future).

For various reasons, also, laptop isn't set up to boot, etc. when power
is restored to it from a powered down state (and for various reasons I'm
not currently planning to change that, even if it might be feasible).
So ... once battery charge was too low, things went down (they were
already offline), and stayed down until wee bit 'o manual intervention
was applied.

Interestingly, ye olde 1u host "vicki" - it's set to power up upon
restoration of power from powered down state.  So, when I arrived back
on-site, "vicki" was up and running.  But not the SF-LUG and BALUG bits
which run on virtual machine.  Why?  Ah, as they were running on the
laptop when the power went down (and when the laptop powered down).
Had they been on vicki at the time the power went out, they would've
come back up again without any manual intervention being needed.
But most of the time, that VM is running on the laptop (much quieter,
much lower power consumption, less (generally) excess heat in
residence).  Oh, I did also recently manage to take reading of power
consumption of "vicki".  When it's plugged in, not running, doing
nothing, it sucks about 5W - not a big deal, but a slight bit of waste -
whatever ... not all that bad for standby state that can be brought up
via, e.g. Wake-on-LAN (which is how I most commonly power it up).
And ... powered up?  Depends a bit how busy it is, but powered up and
idle, it suck about 113W or so.  And powered up and actually doing some
work (e.g. disk I/O and/or more CPU activity) ... nominally it goes up
to around 115W.  Not a huge range, ... but too, not a whole helluva lot
for it to do (only has two hard drives, 2 GiB of RAM if I recall
correctly, some pretty good CPUs, but most of the time more CPU
horsepower than its common workloads really need or can put to use).

Oh, and why would the VM sometimes come up if vicki is powered up, and
other times not?  Simple answer: state.  When I migrate the VM, I leave
it set to automatically restart upon reboot of the host, on the target
it's been moved to, and to not automatically restart upon reboot of the
host on the source it's been moved from.  This is so in the more
nominal cases, should either physical host be powered up or (re)booted,
we don't end up with the VM simultaneously starting on both (or neither)
hosts.  As at least presently, it would be a bad thing for it to come
up on both hosts and active on both at the same time (e.g. IP address
conflict, and other problematic stuff).  Could possibly code to alter
behavior so both wouldn't come up, but there are also other issues with
that - again, back to state.  It's stateful VM, so generally best not
to arbitrarily be firing 'em up independently, etc.  E.g. if they get
fired up independently and relatively haphazardly, there become issues
such as what logs about what activity when, where, what password was
changed on a given account on which host, when, and where, was some
wiki or web page updated or not on which, etc.  So for the most part
the VM is treated as a single host image, and not like two separate
independent images ... though in a crunch, they could be treated as
independent (e.g. one host self-destructs in a puff of smoke or
whatever, can restart VM on the other host - possibly also restoring
data from backups if, e.g., that's more current than when the VM was
last run on the surviving physical host).  I did once manage to
accidentally have the VM running on both hosts at the same time - not
pretty - and it effectively made the VM and its services mostly not
usable for the relatively brief bit while that was the case ("oops").
Anyway, the teensy bit of script I use when migrating the VM between the
hosts, generally quite nicely and just about automagicly takes care of
that - enabling autostart on completion of move to target, and disabling
it on source moved from.

Other random thing on high(er) availability possibilities.  At some
point, may want to add more virtual/cloud into the mix.  E.g. can do
some AWS stuff for free to cheap.  I wouldn't want to do AWS as
*primary*, for various reasons, but *adding*, e.g. AWS (or another
virtual machine elsewhere) could certainly significantly increase the
general availability of the sites' services.  E.g. continue to use one
as "master", but feed/sync all or most all the data to other, so for
most any "read-only" type access (which is the vast majority of the
access), either VM could be read and serve up the data.  Anyway, mostly
just a thought for now - there are other higher priorities for me to
attend to - even within the scope of just LUG stuff to do.  Oh, also,
if BALUG or SF-LUG were 501(c)(3) or the like, there'd be additional
resources available out there for free ... but not a huge big deal, as
one can typically only depend so much upon "free" resources ... e.g.
sometimes they go away or cease to be free.  Oh, and those latter bits
of free - we're talkin' free as in beer, not free as in freedom.

Anyway, with nothing but donated resources (folks' time, equipment,
connectivity, power, domain costs, etc.) ... well, we work with, as
feasible, what we've got to work with ... and optimize, as feasible,
within those constraints.

Another thought ... Raspberry Pi.  Could at some point take what's
presently on VM, and run it on Raspberry Pi.  With the newer Raspberry
Pi, may even be feasible to virtualize the LUG stuff ... though whether
or not it's worth doing so, and exactly how feasible, is certainly
highly debatable.  There are also various advantages and disadvantages
to going with Raspberry Pi and/or incorporating such.  E.g. yet another
architecture to support, and that may also limit the feasibility of
what can be run virtual or not, and when/where.  E.g. from what I've
read, it is quite feasible to well emulate a Raspberry Pi, so one can
run an entire Raspberry Pi as a virtual machine.  But differences in
architecture, etc., not sure how particularly feasible that would be,
for, e.g. performance considerations.  And single architecture does make
things quite a bit simpler and more efficient to manage (e.g. at
present, my laptop, the VM, and vicki, all
Debian GNU/Linux 8.x (jessie) amd64.  Anyway, I don't see such Raspberry
Pi stuff being implemented on at least the shorter term, but may remain
at least a possibility.  Such might also function to provide higher
available, as a read-only (or mostly so), additional instance of the
sites.  But again, I think we're talking longer-term.  I think there are
higher priority things to attend to currently, particularly given
resource situations, etc.

> To: balug-admin at lists.balug.org
> From: jim <jim at well.com>
> Subject: Re: [BALUG-Admin] "all better" (up again): Re:  
> (temporarily) down: SF-LUG (but lists up) ... likewise some BALUG  
> bits down ...
> Date: Tue, 19 Apr 2016 03:33:01 +0000

> Thank you, Michael.
>
> On 04/19/2016 01:45 AM, Michael Paoli wrote:
>> DO NOT REPLY ALL (unless you're subscribed to BOTH lists)
>>
>> "all better" - these sites/resources are up and online again.
>>
>> PG&E had a power outage for approximately 1 hour and 20 minutes,
>> spanning about:
>> 2016-04-18T15:56+0000--2016-04-18T17:16+0000
>> 2016-04-18T08:56-0700--2016-04-18T10:16-0700
>>
>> so things went offline, then down, and then needed slight bit of
>> hands-on to bring everything back up.
>>
>> Sorry for any inconvenience.
>>
>>> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
>>> Subject: (temporarily) down: SF-LUG (but lists up) ... likewise  
>>> some BALUG bits down ...
>>> Date: Mon, 18 Apr 2016 09:38:19 -0700
>>
>>> DO NOT REPLY ALL (unless you're subscribed to BOTH lists)
>>>
>>> Looks like quite similar again, ... was up earlier this morning (as late as
>>> at least 7:45am), noticed sometime before 9:15am it was down/offline.
>>> POTS appears okay, internet side of DSL link looks good,
>>> will investigate further when I'm on-site again (sometime
>>> this evening or so).
>>>
>>>> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
>>>> To: SF-LUG <sf-lug at linuxmafia.com>, BALUG-Admin,  
>>>> <balug-admin at lists.balug.org>
>>>> Subject: (temporarily) down: SF-LUG (but lists up) ... likewise  
>>>> some BALUG bits down ...
>>>> Date: Tue, 22 Mar 2016 15:14:20 -0700
>>>
>>>> DO NOT REPLY ALL (unless you're subscribed to BOTH lists)
>>>>
>>>> FYI, I noticed short bit ago (they were up and online
>>>> earlier today),  that offline are:
>>>>
>>>> SF-LUG:
>>>> ([www.]sf-lug.{org,com,info}), etc.
>>>> but lists remain up and on-line
>>>>
>>>> also
>>>> BALUG
>>>> on-line: [www.]balug.org & lists,
>>>> all other balug bits (e.g. wiki, archive other than list archive, etc.)
>>>> still presently off-line.
>>>>
>>>> I expect to have more information (and hopefully have whatever is  
>>>> the issue
>>>> corrected) by later this evening or so.
>>>>
>>>> (POTS line appears operational, but not reaching the subnet via
>>>> DSL - I'll know more when I'm on-site again).