BCC ...

Sun Jan 4 01:34:35 PST 2015

Dear Michael,
I was referring to precisely _the_ unsafe practice of hitting the
reply button almost as reflex action. Please use the BCC field while
replying, perhaps spending another ... what, another 2-3 seconds more,
maybe :)
I hope you would give the advice a thought ;)
No hurt or insult intended. And please do write in first person :)
My best regards,
Rajib Bandopadhyay

On 04/01/2015, Michael Paoli <Michael.Paoli at cal.berkeley.edu> wrote:
> I hit "Reply-all".  The email addresses shown in the resultant reply
> are thus no more nor less than those I - and all those same recipients
> already were sent - in the To and/or CC lines (and From and Reply-to)
> (headers) of the email I was replying to.  So thus nothing shared with
> anyone regarding email addresses, that had not already been passed to
> them.
>
>> From: "Rajib Bandopadhyay" <bkpsusmitaa at gmail.com>
>> Subject: Re: diagnose/fix boot/software issue: (e.g.) linuxmafia.com host
>> Date: Sun, 4 Jan 2015 14:03:53 +0530
>
>> Dear Michael,
>> You are an educated man, so I expected you to at least post our
>> email-Id.s in the BCC line, so that our id.s don't become public.
>> Why do you breach the safety rules?
>> Please consider this email as an advice.
>> Regards,
>> Rajib Bandopadhyay
>>
>> On 04/01/2015, Michael Paoli <Michael.Paoli at cal.berkeley.edu> wrote:
>>> Rick,
>>>
>>> Just a thought, if it might be useful/helpful.
>>>
>>> Lots of "if"s, including the above ;-) but ...
>>>
>>> I was also thinking ...
>>>
>>> if most or all of the problem to be resolved with the host system that
>>> experienced the failure and is (presumably) still having boot issues
>>> ("bizarre GRUB errors" ...) are software and (at least mostly) not
>>> hardware issues ...
>>>
>>> I was thinking may be able to effectively "crowd source" much of the
>>> work on diagnosing and finding relevant fix/correction, namely, I was
>>> thinking ...
>>>
>>> could provide the relevant data, folks could examine it on virtual (or
>>> separate physical) machine, and find fix/solution(s) to the issue.
>>>
>>> Bits we'd need or would likely also be quite helpful to analyze and find
>>> solution:
>>>
>>> Actual data - since I'm presuming it's a "boot" issue still, failure
>>> occurs prior to kernel successfully loading.  Presuming it's software,
>>> that does relatively narrow down scope of data to something not too
>>> huge.  Could upload and make publicly available (presuming no data bits
>>> within contraindicating such a move):
>>> detailed low-level partition information, e.g.:
>>> # sfdisk -uS -d /dev/sda
>>> (presuming legacy formatting, and sda)
>>> all the data from start of disk up to (but not including) first
>>> filesystem/partition on disk, or a few MiB of such data, whichever is
>>> less.  Most notably that would include MBR, and any other bits GRUB
>>> might squirrel away on disk there.
>>> "boot" filesystem.  If /boot is a separate filesystem, complete image of
>>> that filesystem, e.g.:
>>> # dd if=/dev/sda1 | bzip2 -9 > boot.fs.bz2
>>> If /boot is not a separate filesystem, but is on the root (/)
>>> filesystem,
>>> then instead, provide:
>>> full backup of /boot contents (e.g. pax, cpio, or tar archive of
>>> contents)
>>> and also:
>>> the first few MiB or so of the raw filesystem image (notably to get any
>>> bits GRUB sticks on there in "reserved" areas.
>>> And dump of relevant information for that filesystem, e.g. if it's
>>> ext[234] filesystem:
>>> # dumpe2fs /dev/sda1
>>> Or similar details if it's some other filesystem type.
>>> Also from the root (/) filesystem, any other relevant GRUB configuration
>>> bits, e.g. often found somewhere under /etc - can tar up and provide the
>>> relevant file(s) covering that.
>>> Also, if md or LVM are used for any of those filesystem that may be
>>> needed mentioned above, the relevant md/LVM information as relevant.
>>> If hardware RAID is involved, probably don't need that information, but
>>> just need what the OS/software logically sees of the drive(s)
>>> Also would be potentially highly helpful:
>>> OS distribution and version that was being upgraded from when issue
>>> occurred
>>> OS distribution and version that was being upgraded to when issue
>>> occurred
>>> Likewise, GRUB version, and version being upgraded from, and to, when
>>> issue occurred.
>>> If some of that version information might not be fully known, reasonable
>>> approximations (and indications of such) would still be quite useful,
>>> e.g.
>>> on/about YYYY-MM-DD was upgrading from <distribution> <version> to the
>>> then most current version (or version <version>), the to/from GRUB
>>> versions would be those applicable for the <distribution> versions going
>>> from and to.
>>> Also, some hardware information might also be helpful - probably don't
>>> need to be too detailed, but most useful I'd think would be size of host
>>> RAM, CPU type/family (e.g. Intel 64-bit or 32-bit), and drive controller
>>> type/interface (IDE/PATA/SATA/SCSI/...).
>>>
>>> Anyway, I was thinking, if you're able to pull that data off drive and
>>> upload it somewhere for us, we might well be able to figure out boot
>>> issue and corrective measures - and may involve less total person-hours
>>> in cold garage working to determine fix for the issue.
>>>
>>> Not at all that you have to :-) ... but I was thinking it might possibly
>>> get to "fix"(ed) sooner and easier that way ... and if nothing else,
>>> thought it may be useful to illustrate to folks that such approach can
>>> be used to diagnose issues and test out fixes to a software issue (at
>>> least if the issue doesn't have specific hardware dependencies).
>>>
>>> Also, in any case, having all that backed up can also allow one to
>>> return to that state, if no changes beyond that data are made.  (Do have
>>> to be rather careful though, with "reserved bits" written in reserved
>>> area of filesystem, outside of (before) partitions, etc.  I'm not
>>> spelling out all the details on that here, though).
>>>
>>>> From: "Rick Moen" <rick at deirdre.net>
>>>> Subject: Re: It's a gift (not a newsletter) ; and an offer from SF-LUG
>>>> Date: Tue, 30 Dec 2014 13:56:21 -0800
>>>
>>>> On Tue, Dec 30, 2014 at 1:14 PM, jim <jim at well.com> wrote:
>>>>
>>>>> A couple of meetings ago, a few SF-LUG folks agreed to
>>>>> purchase some old box in good working order and with
>>>>> sufficient resources to host a MailMan system. Rick, if
>>>>> this offer will help you, please let us know: we're willing
>>>>> to find, vet, purchase, and deliver. I'm interested in
>>>>> seeing if I can provide an electrical processing system
>>>>> that can protect your machines from over- and under-
>>>>> voltage mishaps.
>>>>>
>>>>
>>>> Hey, thanks to all of you for the lovely and thoughtful offer.
>>>>
>>>> Thing is, I actually do have a bunch of hardware sitting in my garage.
>>>> At
>>>> least one of them is very likely a functional 1U or 2U rackmount
>>>> server,
>>>> which is the right sort of thing to use.  (Many desktop boxes have
>>>> things
>>>> about them that make them unsuitable, such as many desktop machines'
>>>> ATX
>>>> power supplies not being able to be configured to bring the machine
>>>> back
>>>> up
>>>> without manual intervention when the power returns after a power
>>>> outage.)
>>>>
>>>> Just before I went on my last vacation, I moved the hard drives from my
>>>> server from the failed VA Linux Systems model 2230 to a spare model
>>>> 2230.
>>>> To my relief, I got video and was able to boot an Aptosid live CD.
>>>> Even
>>>> better, I was able to mount my server system's partitions, verified
>>>> that
>>>> they were readable, and update my backups of everything.  Thus, at that
>>>> point, I was no longer in danger of having to revert to an old backup.
>>>>
>>>> Using the live CD, I then attempted to fix the software problems that
>>>> were
>>>> the _other_ issue aside from failed hardware.  (To recap, I had been
>>>> doing
>>>> system updates, and (skipping some details) the system segfaulted in
>>>> the
>>>> middle of the system software upgrade. I cold booted, but there was
>>>> from
>>>> that point forward no video at all, nor beeps, i.e., it acted as if I'd
>>>> had
>>>> failure of the motherboard or other key system hardware.)   I was not
>>>> able
>>>> to find a way to make the system bootable through some hours of
>>>> experimentation - was getting some bizarre GRUB errors - and had to
>>>> defer
>>>> the matter because I had to leave to catch our flight to Barbados.  So,
>>>> I
>>>> powered down the machine.
>>>>
>>>> When I got back from Barbados, I found something perplexing:  I heard
>>>> the
>>>> system fan running, and saw the blue power light on the front panel,
>>>> i.e.,
>>>> it was powered up (even though I'd left the system powered down).
>>>> However,
>>>> despite that, there was no video.  Cold booting the system resulted
>>>> in...
>>>> no video.  This was really bizarre.  The symptom suggested that there
>>>> had
>>>> been a power outage during my time in the Caribbean, and upon the
>>>> return
>>>> of
>>>> power, my system had come online (I hadn't unplugged it, just powered
>>>> it
>>>> down), and that there had then been a second and similar hardware
>>>> failure.
>>>> But this seemed like an implausible coincidence, as perhaps you would
>>>> agree.
>>>>
>>>> Time and experimentation and use of careful logic can get to the bottom
>>>> of
>>>> the matter.  I just haven't lately had the patience to do that, and
>>>> have
>>>> been quite busy with other commitments in the meantime.  Sooner or
>>>> later,
>>>> I
>>>> _do_ plan on sitting out in my very cold garage for as long as it
>>>> takes.
>>>> I
>>>> certainly could give up on debugging the VA Linux Systems gear, and
>>>> just
>>>> attempt to build from scratch a replacement software configuration on
>>>> one
>>>> of the other spare machines I have.  I'd prefer not to do that, because
>>>> building a new server configuration instead of just tracking down the
>>>> one
>>>> software problem that made my system unbootable is a LARGE amount of
>>>> extra
>>>> work.
>>>>
>>>> And, thus, you'll notice, the resource I'm short on is not machines,
>>>> but
>>>> rather time, patience, and focus on the problem.
>>>>
>>>> About over/under-voltage:  Last year, concerned about that very thing,
>>>> I
>>>> set about dealing with that.  First thing I did was to buy an APC UPS
>>>> unit
>>>> over at Central Computer.  However, this never seemed like really the
>>>> right
>>>> solution, just the commercially easy thing to acquire:  A UPS isn't
>>>> actually very great at dealing with power fluctuations (and sometime is
>>>> useless at that, depending on the type), and also interposes a new
>>>> single
>>>> point of failure in the form of a big lead-acid battery that can,
>>>> itself,
>>>> bring down your system.  Also, the UPS generates quite a bit of heat,
>>>> which
>>>> bloats your PG&E bill, and you have to buy replacement lead-acid
>>>> battery
>>>> packs every few years, which are a large percentage of the cost of the
>>>> entire UPS, each time you have to buy them.
>>>>
>>>> What the UPS mostly does - the problem that it exists to solve - is
>>>> bridge
>>>> you across short-duration outages, making it so you don't lose power
>>>> and
>>>> have continuous uptime.  Continuous uptime is abstractly nice, but is
>>>> the
>>>> thing I care least about:  Linux servers come right back up after power
>>>> returns.  That's what we have journaled filesystems for.  So, given
>>>> that
>>>> fact, why would I want to put a continually expensive, heat-producing,
>>>> potentially problematic bit of hardware between the AC outlet and my
>>>> unit,
>>>> one that isn't even very good at line regulation, and that can be a
>>>> Single
>>>> Point of Failure that otherwise wouldn't exist?
>>>>
>>>> In short, I have not been in a hurry to deploy the UPS, because it's
>>>> mostly
>>>> a solution to the wrong problem, a solution to a problem I don't care
>>>> about
>>>> very much.  On reflection, I realised that the right solution is a line
>>>> conditioner unit, not a UPS.  And I don't mean the miserable rubbish
>>>> you
>>>> can get at Fry's, either.  The problem was:  Where do you get a line
>>>> conditioner of the variety that people acquire who are serious about
>>>> the
>>>> problem?
>>>>
>>>> Last summer, I solved that problem:  I went to the De Anza College
>>>> Electronics Swap, very early in the morning, and found a vendor who was
>>>> selling a ham-radio-grade line conditioner unit.  I have that with my
>>>> gear,
>>>> and expect to use it going forward.
>>>>
>>>> Thanks again.
>>>
>>>
>>
>
>
>