[conspire] Sat, 1/10 Installfest/RSVP

Rick Moen rick at linuxmafia.com
Tue Jan 13 03:05:10 PST 2009


Er, sorry, Nick.  That escaped before I had the rest of it edited.

Quoting Nick Moffitt (nick at zork.net):

> And you at least had a simple chain of communication about the BALE
> problem, too.  You didn't have different departments each assuming
> they'd done the right thing and that the other department was the group
> who broke the public Web site that was just given mention in _The
> Journal of Record and Popular Gossip_.

A $FIRM I cannot specify but currently know very well in a professional
context has some real nightmares that I can't even think about getting
into details about, but suffice it to say that there are coordination
problems, that staging servers and VCS merely put a dent in it, and that
OS/toolset/OS-libs problems don't even _rate_, relatively speaking.
(Sorry, can't be more specific.  Would love to, but not at this time.)

> It was you and Deirdre, likely sitting side-by-side, debugging the
> problem and rolling out a fix.

I'm glad to give credit where due, here:   Even though I'd gotten to the
point of understanding every line of what we had, and debugged it in pieces to
the point where I knew _where_ the inserts weren't sticking, I was stumped
as to _why_.  Deirdre figured it out from pure logic:  IIRC, she
reasoned that her code hadn't changed, so something that _did_ change
must have caused the problem, which lead to looking at the effect of
package upgrades, which pointed to the MySQL upgrade, which lead to the
upstream ChangeLog and the necessary clue.  Or something like that.

I had stupidly forgotten about the possibility of the Web app breaking
even though the site code was unchanged and all we'd done is upgraded
the box from one set of correctly working apps to another.  It was a
valuable lesson.

> I agree that reasonable people could continue to operate at that larger
> scale, keeping the priorities and techniques you employ, and do a great
> job at it.  But I don't think it's fair to characterize being cautious
> about the above scenario as "foolishness".

That word reflected my having seen the cost, at a prior employer that
is still stuck on antique enterprise-Linux versions carried forward long
past reason, notably RHEL3 Update 5 (and Update 8).  Red Hat was
(perhaps still is; haven't checked) claiming to "support" that thing
through selective package updates in RHN, but it had long become
ridiculously creaky, and the only people who couldn't face facts were
the company's management, whose policy decreed that RHEL3 would remain a
"supported platform", and if facts such as broken key system calls stood
in the way, so much the worse for the facts.

> You still have a bit of a struggle with the "Congratulations in-house
> developers: you're writing for a constantly-moving target!" message, but
> I certainly wouldn't characterize this approach as "foolishness" either.

See, I don't think that should necessarily follow.  At the firm I cite,
some internal developers kept insisting that the company's officially
blessed variants of RHEL3, 4, and 5, and of SLES9 and 10, could not be
permitted to have upgraded kernels (which my group, which maintained
those images, needed to upgrade for hardware-support and similar
reasons).  The developers kept succeeding in blocking migration to
neweer kernels, evne though there was no rational reason why their
strictly userspace code should have kernel sensitivities.  They claimed
their stuff might break.  We said, "If it does, it's embarrassingly
broken and _that_ is the problem you should fix without delay.  They dug
in their heels and claimed there would be lost revenue.  They won.

That's corporate politics, but it was mindnumbingly stupid.  There was
nothing whatsoever in their code that should not have been compilable on
any release of RHEL{3|4|5} or SLES with their simple dependencies fully
and robustly specified by major and (sometimes) minor versions of named
libs.   Everything they wrote should have worked trivially on any
X11-based Linux or BSD.  The refusal to clean up their code, and the
resulting loss of rational decision-making, easily qualifies as
"foolishness" and more.

And damned near any codebase you can cite that can be rationally
expected to break just from an orderly distro upgrade of something with
a functional policy -- even Ubuntu, which I think has yet to prove
itself as reliable as I'd like -- is going to rest on an equivalent sort
of underlying foolishness.  (My view; yours for a small fee.)





More information about the conspire mailing list