[sf-lug] sf-lug.com. box questions, documentation, "rules of the road", policies, etc.
Michael.Paoli at cal.berkeley.edu
Wed Apr 25 07:39:18 PDT 2007
sf-lug.com. box questions, documentation, "rules of the road", policies, etc.
Still documenting more, but a few questions along the way ...
appropriate outage notification and out-of-band status page?
We earlier discussed outages, planned outages, etc.
Even have a place to document that further
Two particular items/questions occurred to me regarding that.
First of all, for planned outages, *who* do we want to notify,
and *how* do we want to notify them? Might that also depend on
circumstances, nature of outage (whole box down, or just some
important service(s)), duration and timing? Would we want to:
* do a wall on the system
* edit /etc/motd and/or /etc/issue
* e-mail the "pagermonkeys" on the box
* e-mail the sf-lug list
* and/or other?
and do we want to come up with "rules" (/guidelines) on what method(s)
should be used under what circumstances?
Also out-of-band status page? It would be potentially very useful to
have some out-of-band (independent of that box, and preferably also
independent of that colo) status/notification page. E.g. it can be
highly useful to have an independent web page (could just be a wiki web
page somewhere) that indicates some status information (most notably
if/when any unexpected outage occurs - to indicate status and
estimate/guestimate on return to service, but also a place for folks to
look during scheduled outages - such as if they didn't know in advance
about the outage). E.g. rather like:
out-of-band status page:
for status of:
Although it's possible to use standard LINUX/CentOS tools to get some
information on the hardware (e.g. CPU, disk sizes, some bits of
chipset information here and there), could someone document the
hardware details - e.g. make and model of the system, any particular
details of hardware/options installed, etc. Having such information
known (and documented!) could come in rather to quite handy in
troubleshooting any items that may be hardware related, planning
certain optimizations and potential upgrades, etc. If someone is
able to at least provide the basic hardware information, we could
probably get that up on a wiki page, including hunting down relevant
reference information (e.g. links to more detailed hardware
specifications for particular make/model of items identified).
More documentation/log stuff ...
I started two log files on the box - feel free to have a look at
the /home/admin/log* files (most notably /home/admin/log). The
general idea there is human readable
(and fairly searchable by date, or other criteria) log of system changes
made, issues/bugs noted/corrected, etc. Most notably the idea here is
to keep lots of less details regarding such off of and from
piling up ad nauseum on wiki pages (could eventually get quite long),
and it's also often much easier to drop information straight into flat
file or copy/paste from such, and not have to worry about wiki
formatting goop and how to get something to render as plain text.
The wiki pages are probably much more suitable for more general
documentation (e.g. policies, how-to, etc.) - such as things likely to
be revised over time (as opposed to continually appended to and not as
likely to be of more general interest). For a bit more of an idea,
have a look at /home/admin/log - it's already up to 82 lines - and
that's just covering a bit of usage/syntax, and noting and dealing with
a few minor issues. As I noted, such can get quite long (e.g. on my two
home systems, the equivalent file I maintain on each have grown to be
in excess of 10,000 lines long (not that we have to be *that* detailed
on the sf-lug.com. box - on my home systems, I log, for example, all
package additions/removals/upgrades - including package version
information, bugs and hardware issues/problems encountered, hardware
changes, etc.; capturing/noting at least more noteworthy changes/issues
for the sf-lug.com. box would probably be a good thing.)).
Code of ethics? Should we add to the "rules of the road" / policy
something indicating an appropriate code of ethics? The more
experienced systems administrators likely think such would be quite
applicable anyway, but, most notably for those that may be much newer to
the field, explicitly noting, or at least referencing such, would help
call attention to such, introduce such to those not already familiar
with such, and help develop and foster appropriate professionalism.
E.g. could add something roughly like:
Users of the system, and most notably systems administrators and any
other persons with any type of privileged access to the system, should
exercise appropriate professionalism and follow appropriate code of ethics,
e.g. the LOPSA/SAGE/USENIX code of ethics:
Quoting Michael Paoli:
> Just a bit of a start (I plan to add more), but I put some of the
> information on the wiki. Feel free to correct anything that's incorrect,
> improve formatting/presentation, etc.
More information about the sf-lug