The most recent version of this essay can be found at http://linuxmafia.com/faq/Licensing_and_Law/forking.html.

Fear of Forking essay

original version (corrected and annotated)

[1] (Footnote on overall context of this essay. Please read.)


From rick Sun Nov 14 16:13:06 1999
Date: Sun, 14 Nov 1999 16:13:06 -0800
From: Rick Moen rick
To: [several individuals at my former firm]
Subject: Essay for the Brown-Baggers: code forking
Message-ID: 19991114161305.C32325@uncle-enzo.imat.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
X-CABAL: There is no CABAL.
X-CABAL-URL: There is no http://linuxmafia.com/cabal/
X-Eric-Conspiracy: There is no conspiracy.
X-Eric-regex-matching: There are no stealth members of the conspiracy.
Status: RO
Content-Length: 20135
Lines: 381

Ed, I hope you would not mind forwarding this essay to the Brown Baggers mailing list. I was trying to finish this for the sales force's benefit before my departure, but ran out of time, in my rush to get my department in order.

WHY LINUX WON'T FORK
And why being able to fork is still A Good Thing.[2]

I noticed some puzzled faces when Nick[3] did his presentation on licences at a Brown Bag session, and talked about the right to fork source code. He pointed out that the right to start your own mutant version of any open source project (which is what we mean by "forking") is an important safeguard. He and I both stressed that the absence of that right in Sun's "SCSL" (Sun Community Source Licence), used for Java, Jini, and (potentially) Solaris[4] and Star Office is what prevents SCSL from being genuinely open source. (Borrowing a term from Eric S. Raymond, I called SCSL projects "viewable source".)

But this creates a puzzle for you guys[5]: I'll bet you have to work hard to fight customer fears that GNU/Linux [6] will fragment into a hundred incompatible versions because there's no single big corporation in charge. Right? And here Nick and I come, saying thank God open source licences guarantee everyone the right to do just that.

Sounds contradictory, right? OK, here's the quick and dirty answer. The detailed one comes later:

Linux won't fork because the fork-er has to do too much work for no payoff: Any worthwhile improvements he makes will be absorbed into the main branch, and his fork will be discarded/ignored as pointless.
The above happens with Linux, even though it hasn't with earlier projects, because of the effect of Linux's source-code licence.

NOTABLE PAST INSTANCES OF FORKING

1. Unix --> dozens of proprietary mutant corporate Unixes

If you've read up on Unix history, you know that Unix was a freak product of AT&T's Bell Labs division, around 1969. I'll omit most of the long story, but the most important fact to know is that AT&T was then operating under a January 24, 1956 Department of Justice anti-trust judgement [7] (which expired around 1980) prohibiting it from entering the computer/software business, and required it to reveal what patents it held and license them when asked. So, it could not legally sell Unix, but instead sold source-code licences (and occasionally also the right to use the trademarked name "Unix") to (1) universities, such as U.C. Berkeley, and (2) companies such as IBM, Apple, DEC, Data General, SGI, SCO, HP, etc.

Those companies bought the right to make their own Unixes: IBM released AIX. Apple did A/UX. DEC did Ultrix[8], OSF/1, and Digital Unix (later renamed "Compaq Unix" and now "Compaq Tru64 Unix"). Data General did DG/UX, SGI did IRIX, HP did HP/UX, and SCO did Xenix[9] which eventually mutated into SCO Open Server. And we could cite others, but I'll spare you.

The point is that these were the jokers who ruined Unix. Every one of them marketed his mutant Unix as "Unix plus" -- everything the other guys have and more. Needing to create differentiators, they deliberately made their Unixes incompatible while giving lip service to "standards".

For customers, this was simply a mess, and Microsoft drove right through these guys' disunity like a Sherman tank. It is the classic instance of forking that sticks in people's minds. Which is why you folks are expected to assure customers that the same won't happen to GNU/Linux. We'll return to this point later.


2. BSD --> FreeBSD, NetBSD, OpenBSD, BSD OS, MachTen, NeXTStep (which has recently mutated into Apple Macintosh OS X), and SunOS (now called Solaris)[10]

As I mentioned above, antitrust-limited AT&T, not being able to sell Unix itself, gave out very cheap Unix source-code licences [11] to universities including U.C. Berkeley. UCB's Computing Systems Research Group (CSRG) took the lead in the academic world: Having access to the source code, they quickly realised that they could rewrite it to make it much better, and slowly did so. Their rewrite was dubbed "BSD" (Berkeley Software Distribution), and they were glad to share it with anyone similarly having an AT&T Unix source licence.

And their work was generally a great deal better than Bell Labs's, partly because it benefited from worldwide peer review in a very open-source-like fashion.[12] Over quite a few years, they gradually replaced almost all of the AT&T work, without (at first) really intending to.

One fine day in 1991, grad student Keith Bostic came to the BSD lead developers, inspired by Richard M. Stallman's (remember him?) [13] GNU Project, and suggested replacing BSD's remaining AT&T work to create a truly free BSD. Dreading the confrontation likely to result with AT&T, they tried to stall by assigning Bostic the difficult part of this task, rewriting some key BSD utilities. This backfired when he promptly did exactly that. So, they grumbled but then completed the job[14], and tried to prevent AT&T from noticing what they had done.

AT&T did notice[15], panicked, and sued. That, too, is a long story best omitted. Under the stress of the lawsuit, freeware BSD split into three camps (FreeBSD, NetBSD, and OpenBSD).[16] But there were also several proprietary branches[17], made possible because U.C. Berkeley's "BSD Licence" allowed creation of those: Sun Microsystems's SunOS, Tenon Intersystems's MachTen, BSDI's BSD OS [18], and NeXT Computer's NeXTStep OS all came out for sale without public access to source, and were all based on the Berkeley BSD source code.

Note the distinction: If you write a program and release the source code under the GNU General Public Licence (GPL), other people who sell or otherwise release derived works that incorporate your work must release their source code under GPL conditions. The same is not true if you release your work under the BSD Licence: Anyone else can create a variant form of your work and refuse to release his source-code modifications. (In other words, he is allowed to create proprietary variants.)

A word about the three free BSD variants: All three were splinters from a now-dead project called 386BSD. All have talked about re-merging in order to save duplication of effort, but they now persist as separate projects because they've specialised: FreeBSD aims for the highest possible stability[19] on Intel x86 (IA32) CPUs, NetBSD tries to run on as many different CPU types as possible, and OpenBSD aims to have the tightest security possible. In other words, the 386BSD project remains forked because there are compelling reasons that make this a win for everyone.

Also, where possible, these three sister projects collaborate on tough tasks -- and they also collaborate with GNU/Linux programmers. Some of the best hardware drivers in the Linux kernel are actually BSD drivers. There's a high level of compatibility among the three BSDs and between them and GNU/Linux: Unlike the proprietary Unix vendors, BSD and GNU/Linux programmers have an incentive to eliminate incompatibility and support standards.


3.  emacs  -->  GNU emacs  
                            -->  Lucid emacs  --> xemacs
           -->  other proprietary emacsen, now mostly forgotten

The Emacs editor / programming environment (short for "editing macros") was originally written by Richard M. Stallman (with Guy Steele and Dave Moon) in 1976, in TECO macros and PDP 10 assembly, to run on ITS and TOPS-20 -- at that time, under no explicit licence terms. (Stallman has clarified that it did carry a statement that "People should send changes back to me so I could add them to the distribution.") It proved wildly popular, and by 1981 had started to give rise to explicitly proprietary variants, notably James Gosling's C-coded "Gosling Emacs". [The original version of this essay's section on Emacs forks was sadly confused, as I had confused this "Gosmacs" fork with others, in attempting to recall Emacs history solely from unaided memory, and my explanation went wrong from that point on. For this revision, I've replaced that entire section.]

In 1985, Richard Stallman resumed leadership, creating his flagship GNU Emacs version in C, based initially on Gosling's work, but replacing all Gosling code by mid-year, enabling Stallman to place the work under his newly written GNU General Public Licence, which he then did. At this point, mid-1985, Emacs's open-source history begins.

By 1991, Stallman's GNU Emacs had gone from major versions 15 through 18, with a number of point releases. NCSA originated a set of popular patches ("Epoch") to improve GUI support: GNU Emacs 19 was expected to merge Epoch's features cleanly.

So things stood as developers at Lucid, Inc. (who used Emacs with their proprietary C / C++ development tools) began participating in the GNU Emacs development effort, attempting to bring about version 19. For reasons that remain disputed (http://www.jwz.org/doc/lemacs.html), the Lucid developers and Stallman had difficulty cooperating, and the Lucid developers released their version as Lucid Emacs 19.0, in April 1992. (As a fork of GNU Emacs, it is likewise under the GNU GPL.)

The anomalous aspect of this rare fork of a GPLed work is not so much that it occurred as that it persists to this day: Lucid Emacs was renamed XEmacs in September 1994 (after Lucid, Inc. closed) and remains equally popular with Stallman's version. This appears to be a rare case of differences about working styles, design issues, and management policies outweighing the advantages of re-merging. However, even here, convergence occurs: Since much of an Emacs implementation's functionality exists as elisp macros, essentially all of that code is common to the two rival Emacs projects. And each benefits from studying the other's new features and code.


4. NCSA httpd --> Apache Web server

These days, the world's standard Web server package is the Apache package, maintained by the all-volunteer Apache Group. (That is not to say that they don't make money: Members of the Apache Group such as Brian Behlendorf have practically a licence to print cash, when it comes to Web consulting, because of their well-earned fame.)

But, before there was an Apache, you ran either the University of Illinois at Urbana-Champaign National Center for Supercomputing Applications's "NCSA httpd" (HyperText Transport Protocol daemon) or Geneva-based CERN's (Centre Européen pour la Recherche Nucléaire's) "CERN httpd". The NCSA daemon was smaller and faster, while the CERN one was famous mostly for association with the creator of the Web, Tim Berners-Lee, who worked as a researcher at CERN. [20]

CERN's httpd (later called "W3C httpd") was always under an early sort of free-software licence. It's no longer maintained -- a dead project. It's unclear what NCSA httpd's licence was originally, but when that project died (1996) its licence was a "free for non-commercial usage only" one.[21]

In any event, the story is that an on-line group of programmers who had been producing patches (modifications) for the NCSA httpd eventually decided that they'd produced their own variant in 1995, forking the code. "Apache" was originally just Brian Behlendorf's temporary code name for the project, but fellow developers then pointed out the name's appropriateness ("a-patchy" server = "apache"; get it?), and it stuck.

In any event, this is an instance of why and how open-source projects fork benignly, for good reason: Development at NCSA had stalled after the package's original creator, Rob McCool, left the Center. If that happened to a proprietary product, it would just die, leaving all its users in the lurch. However, because the product was so useful, the Apache Group forked the source code and kept driving it forward. It now dominates all Web servers, regardless of their marketing and development budgets.


5. gcc --> pgcc --> egcs --> gcc

Here's an odd one: Richard M. Stallman (remember him?) founded in 1984 the GNU Project, which produced[22] the immensely important GNU C Compiler ("gcc"). gcc is designed to work on just about any remotely feasible computer, not just the Intel x86 (IA32) series. So, it might just have been other priorities that delayed improved Intel support. Specifically, well into 1997, the best the then-current gcc 2.7 series could do for code optimisation on Intel was to set the compiler for 486 chips. People pleaded with FSF for Pentium optimisation, but were stubbornly ignored.

So, two separate groups, in succession, developed Pentium-optimised compilers as forks from gcc 2.7. The first was "pgcc", from the Pentium Compiler Group, a consortium consisting mainly of Intel Corporation staffers. pgcc produced very fast code via a two-pass process, but was completely non-functional on gcc's non-Intel platforms, and for that reason could not be accepted into the main gcc code. Further, it departed so radically from the base gcc code that it proved difficult for pgcc to track gcc improvements.

However, the Pentium Compiler Group distributed its work widely, and its Web site remained available as a major resource on Pentium optimisation issues for interested parties -- so much so that my initial version of this section, based on memories of that site, inadvertently confused the Intel/PCG work with the later egcs work (discussed below). My thanks to Ian Lance Taylor of CYGNUS for helping me straighten out the account.

Perhaps inspired in small part by receiving a copy of pgcc, but more so by a desire to make their jobs easier, improve the compiler they worked extensively with, and broaden the development model to include more developers (than just Richard Kenner, who was in charge of gcc), programmers at the CYGNUS company of Sunnyvale, California (the one that was recently bought by Red Hat Software, Inc.) independently followed up pgcc with the more-successful egcs compiler. Unlike pgcc, egcs was only a modest departure from gcc 2.7, was equally portable, and was like gcc single-pass. And it was a very clear improvement over Kenner's 2.7 and 2.8 gcc series, not just in adding Pentium support.

For whatever reason, Stallman's Free Software Foundation (developers of the GNU Project) continued to act as if egcs didn't exist. So, GNU/Linux distributions began to emerge based on egcs, and the free-software world began to mostly ignore gcc.

This can be seen as a variant on the Apache experience. The ability to fork means that progress will not be impeded by a developer not wanting to move forward: Somebody else can, as gracefully as possible, assume the leadership role and (if necessary) fork the project.

However, this necessity was averted in the egcs case. In April 1999, the FSF re-merged egcs into the (would-be) main gcc branch, and handed over all future development to the egcs team (such that egcs 1.2 became gcc 2.95), thereby resolving the conflict.


6. glibc --> Linux libc --> glibc

This is a nearly mirror-image case. Any Unix relies extremely heavily on a library of essential functions called the "C library". For the GNU Project, Richard M. Stallman's (remember him?) GNU Project wrote [23] the GNU C Library, or glibc, starting in the 1980s. When Linus and his fellow programmers started work on the GNU/Linux system (using Linus's "Linux" kernel), they looked around for free-software C libraries, and chose Stallman's[24]. However, they decided that FSF's library (then at version 1-point-something) could/should best be adapted for the Linux kernel as a separately-maintained project, and so decided to fork off their own version, dubbed "Linux libc". Their effort continued through versions 2.x, 3.x, 4.x, and 5.x, but in 1997-98 they noticed something disconcerting: FSF's glibc, although it was still in 1-point-something version numbers, had developed some amazing advantages. [25] Its internal functions were version-labeled so that new versions could be added without breaking support for older applications, it did multiple language support better, and it properly supported multiple execution threads.

The GNU/Linux programmers decided that, even though their fork seemed a good idea at the time, it had been a strategic mistake. Adding all of FSF's improvements to their mutant version would be possible, but it was easier just to re-standardise onto glibc. So, glibc 2.0 and above have been slowly adapted as the standard C Library by GNU/Linux distributions.

The version numbers were a minor problem: The GNU/Linux guys had already reached 5.4.47, while FSF was just hitting 2.0. They probably pondered for about a millisecond asking Stallman to make his next version 6.0 for their benefit. Then they laughed, said "This is Stallman we're talking about, right?", and decided out-stubborning Richard was not a wise idea. So, the convention is that Linux libc version 6.0 is the same as glibc 2.0.


7. Sybase --> Microsoft SQL Server

Woody Allen has a saying that "The lion may lie down with the lamb, but the lamb won't get much sleep". Much the same can be said of companies that enter "industry alliances" with Microsoft Corporation. One of the several slow-learner corporations to make this mistake was Sybase Corporation, publisher of the Sybase Structured Query Language (SQL) database package for numerous Unixes and NetWare. As part of the alliance, Microsoft sold Sybase to its customers, relabeled as Microsoft SQL Server, and got access to Sybase's source code under non-disclosure agreement.

Then, predictably, Microsoft broke the alliance when it had learned all it could from Sybase, and reintroduced Microsoft SQL Server as its own product in competition with Sybase. I do not know if current MS SQL Server versions are rewritten from scratch or retain Sybase code under licence terms[26], so this may not be a legitimate case of forking (let alone open source), but it's similar enough I thought I should mention it.


ANALYSIS: WHY OPEN-SOURCE FORKING IS BOTH RARE AND BENIGN

You, the reader, can fork any open source project at any time. This is absolutely not cause for alarm. Let's prove it: Get a copy of the current Linux kernel from ftp://ftp.kernel.org/. Rename it. Call it Fooware OS. Send out messages to everywhere you can think of, announcing that Fooware OS has splintered off from Linux, and great things are expected of it.

Wait for reactions. Wait some more. Listen to the clock ticking. Sort your lint collection. Open up the source code tree, think about what you might do with it, and wonder where you're going to find the time.

Well, that's a little unfair: You're probably not a programmer. Let's imagine that you are. You're a ninja programmer with mighty code-fu, a drive to succeed, and a disciplined team of programmer henchmen. So, you don't just listen to the clock tick, but get some really good work done. You improve the heck out of the kernel, in fact. And then the Linux people smile broadly, and quite sincerely tell you "Thank you very much." Like effective programmers the world over, they know programming is difficult work and are constructively lazy. That is, they're not proud, and are glad to use other people's work -- when that's allowed.

Oh, you forgot that your work was under the GPL, didn't you? By forking off, working on, and distributing your variant of a GPLed work (the Linux kernel), you consented to issuing your improvements under the GPL also, for other people to freely use. So, you only thought you were creating Fooware OS; in fact, you were creating a better Linux.

That's why forking is uncommon in open-source code, and even more so in (specifically) GPLed code: The improvements one group makes in its would-be "fork" are freely available to the main community.

But, as we have seen from the mostly non-GPL examples above, forking is nonetheless not only always an option, but is a vital safety valve in case the existing developers (1) stop working on the project, or (2) decide to stand in the way of progress. The fact that this can occur is A Good Thing.

A third reason for forking also exists, and may hit the GNU/Linux community eventually: specialisation. You may recall that this is what ultimately happened with the three free BSD variants -- although stress from the clash-of-the-titans AT&T v. U.C. Berkeley lawsuit arguably made that situation unique.

That is, somebody may eventually propose to the Linux kernel team some extension that's simply outside the scope of the project, and yet builds enough support behind it (and has enough reason for existing) that it proceeds anyway. In that case, Linux will fork -- and it will be a good thing, because then there will be two strong projects instead of one, each concentrating on an important niche that the other cannot fill.[27]

If that happens, the forks would undoubtedly share code and information exactly as the BSD variants do, to prevent duplication of effort, and because it makes sense to do so. And the world will be richer for both the fork and the sharing.

I hope this explanation proves useful to you. It's been a pleasure working with you all, and please do stay in touch.

-- Rick M.


[1] On the afternoon of Saturday, November 13, 1999, I was badly in need of a little diversion: I was sitting next to the hospital bed of my girlfriend, who had been abruptly dismissed from my firm (a major-name Linux firm that shall go nameless), three days earlier. On account of my own separate issues with management, I had just resigned my job as chief of the system administration department. I also had just gone through a week of sixteen-hour days to put my department in top condition (preparatory to leaving), plus I was suffering exhaustion and a week-old case of influenza. And now we were waiting the first of many hours for my girlfriend's blood-test results, and so I needed something to keep my mind occupied.

Fortunately, I had my laptop computer with me, and so I wrote out an essay I'd been working on in my head for the past week or so: I wrote it, beginning to end, in one long, continuous typing spree, over the course of several hours. Please note that I had no access to the Internet or any reference works; all contents were from my fatigue-addled memory. I was still exhausted the next day, and e-mailed out the completed text without meaningful checking.

That essay was intended not for the world at large, but rather for an internal mailing list at my former firm, as a parting gift to my friends in the sales department: A couple of months earlier, I had helped found a series of internal seminars, the Brown Bag series, with a presenter every Tuesday and Thursday briefing the sales department (and other interested employees) on different aspects of free / open-source software, to help them understand the GNU/Linux community, its history and customs, and many related, sales-relevant issues.

As an adjunct, we also set up a Brown Bag mailing list, where the sales force were encouraged to ask us techies anything they needed to know. To help matters, whenever a salesman asked me a good, fundamental question (which happened frequently, since all of them were impressively bright, quick learners), I would spend an evening writing it up in essay form, for the entire list.

So, my "Fear of Forking" essay (a title later suggested for it by Eric S. Raymond) was my final contribution, to close out that series of essays. I sent it to the Brown Bag list-owner, and cc'd two other former colleagues for their enjoyment. One of the latter asked if the firm could use it as a featured article, which they then did. That version was then picked up by Slashdot.org.

Please bear in mind that design goal, as you read the piece: It was not intended to be a history of Unix, nor a technical treatise on code forking, but rather a conceptual overview of the code forking issue, with brief historical examples, to explain to a sales team how and why free / open-source software differs from proprietary software in not having that problem -- to help them better do their jobs.

It might be worth mentioning that "Unix" was written as "UNIX" early in that OS's history -- for no good reason. (It's not an acronym.) Likewise for some similar non-acronym names such as XENIX and ULTRIX. I have rendered all in the more modern and grammatically-justifiable fashion.

[2] Staff at my former firm changed the subtitle for their version to "How the GPL Keeps Linux Unified and Strong" and wrote three different, replacement introductory paragraphs, for the version published at their Web site.

I generally like what they wrote, but -- in those three new paragraphs and the new subtitle -- they did inadvertently make the piece seem a bit of an advertisement and ideological argument for (specifically) the GNU GPL, which was not my intention. The BSD crowd would argue that although BSD licensing does allow proprietary code forks, those tend to be temporary and/or lose momentum because they cease to fully benefit from the exchange of code and information in the larger BSD community. I would strongly agree. History seems to support their claim.

[3] Nick Moffitt, Brown Bag lecturer and member of my staff at my former firm.

[4] An edit error on my part: In the essay as posted to the Brown Bag list, I included Solaris among those works whose source code had already been issued under the SCSL family of licences. As of that date, Sun Microsystems was clearly contemplating such a move but had not taken it: Solaris remained binary-only. So, I have fixed my essay text to list it correctly in the "potential" category. (Sun later promised it would genuinely open-source Solaris, and did exactly that in 2007, albeit with some third-party inclusions available only as proprietary, binary-only code, including many important device drivers.)

There is no one definitive SCSL licence text: Each instance is slightly different, such that Sun refers to SCSL as a family of licences.

In May 2007, finally, Sun also re-released almost all of the Sun Java toolkit under the GNU General Public License, leaving out only a few component libraries whose third-party owners refused to go open source. Those will remain proprietary and binary-only.

[5] Referring, in case it isn't obvious, to the firm's sales force.

[6] I use this term to refer to complete operating system packages based on the Linux kernel, and "Linux" to mean just the kernel. At various points, I also used the terms "free software" and "open-source software" interchangeably, considering them two terms for the same concept with different marketing emphases -- differences in emphasis that are often significant, but not here.

[7] Technical speaking, a consent decree as opposed to a judgement. Consent decrees are legal settlements under which a firm agrees to be bound by certain restrictions, in exchange for a regulatory agency agreeing to drop a legal action against it. The firm gets to avoid proceeding to judgement, and a milder penalty than it might suffer at a judge's hands. The agency gets a quicker settlement, and avoids the always-present possibility of eventually losing, not to mention the certainty of greater legal costs.

[8] Long-vanished DEC Ultrix was actually based on Berkeley Unix (BSD), and therefore only indirectly on AT&T Unix. (Does anyone really care?)

[9] Xenix was a very limited and much-derided Unix variant for 80286 (AT-class) machines, written by a team at Microsoft, whose division immediately split off as Santa Cruz Operation (SCO), taking the product (such as it was) with them in exchange for giving Microsoft (if memory serves) 40% of their stock -- a kind of reverse acquisition. I hadn't forgotten this history in writing my essay; it's just one of many fine details that I saw no need to get into.

(That "SCO" company, based in Santa Cruz, California, which eventually renamed itself to Tarantella, sold off its unwanted OS business division, and was merged into Sun Microsystems, should not be confused with the more recent SCO Group of Lindon, Utah. The latter company, which had started life as Linux firm Caldera Systems, Inc., in 2001 purchased the "SCO" name and related unwanted OS division, exited the Linux business, and since then has attempted rather poorly to be a proprietary Unix company, but has become better known for its even more doubtful lawsuit efforts.)

[10] There's a tedious and well-worn flamewar over SunOS vs. Solaris that I was hoping to skirt entirely. Naturally, it became a prime topic on the Slashdot thread, nonetheless:

Originally, SunOS (through version 4.1.0) was a variant form of Berkeley Unix (BSD). Then, Sun Microsystems developed Solaris as (approximately) an outgrowth/extension of SunOS, but in so doing discarded the existing BSD-based codebase, rewriting it on an AT&T System V Unix foundation. Nonetheless, Sun maintains to this day that Solaris incorporates SunOS, and the output of Solaris's "uname" command asserts this, too. For their part, hordes of BSD-loving Sun users continue to despise Solaris, and use the term "SunOS" to mean pre-SystemV versions.

Given that my essay was not intended to be a history of Unix, it certainly wasn't intended to delve into this morass of definitional warfare.

[11] For a decade or more, the academic licence fee amounts were at token levels, but then were raised astronomically when AT&T emerged from the antitrust consent decree, and was able to sell Unix directly.

[12] Arguably, Sun Microsystems was trying to establish a similar sort of community, collectively bound by a uniform proprietary licence but able to work with shared code openly, with its SCSL. Free / open-source software advocates, during Sun's SCSL years of the late 1990s and early 2000s, often accused Sun of attempting to subvert their community using SCSL, but I believed the charge was erroneous -- and my view was vindicated in 2006-7 when Sun went substantively all-open-source with its main offerings.

[13] This phrase is a private running gag I had with some of the firm's sales force, after Richard Stallman addressed them in June 1999: RMS's name kept surfacing in my explanations of software history and primary issues to the sales representatives, and I kept having to confirm that this was indeed the same young but ubiquitous Richard M. Stallman they had heard speak.

I should stress, also, that in referring to Richard (in this essay to my former company's sales staff) as "stubborn", I meant that term strictly in an admiring and laudatory spirit. This, too, was a running gag we had kept going: When, in mid-1999, I had first explained Richard's history and long-term programme to the sales staff, a week before he was scheduled to visit our firm and give a guest lecture, I had started by quoting from George Bernard Shaw's introduction to Man and Superman: "The reasonable man adapts himself to the world; the unreasonable man persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man."

[14] Which entailed identifying and replacing all code that either was definitely or might be AT&T's, in the BSD system.

[15] There were several things that tipped off and provoked AT&T, such as the use of something resembling an AT&T "Death Star" corporate logo on the cover of a then-recent BSD release, but apparently the final provocation was tiny proprietary BSD-offshoot BDSI's January 1992 newspaper advertisement that advised readers to call "1-800-ITS-UNIX". There is much more to this story, but it is told better elsewhere.

[16] This statement is vague -- intentionally so, and in several ways at the same time. (Again, this was never intended to be a history of Unix.) At the time, the lawsuit's allegation of infringement of AT&T source-code copyright cast a shadow over not just Berkeley's BSD and BSDI's BSD OS, but also William and Lynne Jolitz's free-software 386BSD project, which was a patchkit of about 180 patches based on Berkeley's "NET/2" BSD release. (In retrospect, there is grave doubt about whether AT&T's complaint ever had merit, on several grounds, but this was not clear at the time. Kirk McKusick told me that defendants ultimately agreed in the settlement that seven BSD files were encumbered by AT&T copyrights only to let AT&T save face on what would otherwise have been a dead loss, allowing that concession because the seven files were dreadful spaghetti code and overdue for replacement anyway.) Apparently in part because of the legal threat, the Jolitzes withdrew from the project, leaving copyright questions and a leadership vacuum on top of the project's other problems.

Two separate groups separated (setting up real source repositories and version control) from the stagnating 386BSD patchkit project: first NetBSD and then FreeBSD, both in 1993. In May, 1996 (after the January 1994 legal settlement), OpenBSD emerged as an offshoot from NetBSD.

Nov. 2004 addendum: I do not in general update this essay beyond attempting to fix errors as of its original publication date, but for completeness's sake will mention that, in July 2003, longtime FreeBSD core committer Matthew Dillon and others forked off DragonFly BSD based on the FreeBSD 4.x architecture, because of serious dissatisfaction with FreeBSD 5.x.

A complete roster of open-source BSD forks must also include Apple Darwin, for a total of five surviving forks. Darwin is the core of Apple Macintosh OS X, and comprises NeXT, Inc.'s old NeXTStep / Openstep code built around NeXT's xnu kernel, updated and retrofitted with an emulator for legacy MacOS code.

The "stress of the lawsuit" (my phrase) was clearly a leading force behind this overall chain of events. Those (on Slashdot) who interpreted that phrase as meaning that the lawsuit directly and unambiguously caused each step in it drastically over-interpreted my wording, to an extent I would have scarce thought possible previously.

[17] As footnoted elsewhere, DEC Ultrix was also among these.

Some commentators interpreted this passage as claiming that the AT&T lawsuit likewise caused all of the proprietary forks, all at the same time and simultaneously with the free-software forks. Unfortunately for this odd interpretation, I nowhere said, and certainly did not mean, any part of that.

Moreover, there seems to be a persistent misapprehension that I had implied some time-line for the history of the Unix operating system, hidden in that paragraph, since a number of commentators remarked on the time-line's (supposed) inaccuracy. I checked; there's no time-line. And I can state from personal knowledge that it's not missing on account of my forgetting it.

If you think that comment (just preceding) sounds ludicrous, you should have seen the Slashdot posts and e-mails that made it necessary. Maybe they're crazy, maybe I'm crazy: You decide.

[18] Which was at first named BSD/386, not to be confused with the Jolitzes' free-software / open-source 386BSD.

[19] I originally wrote "performance", here, but then was admonished by early readers that the performance advantages of FreeBSD are an incidental effect of it being designed for stability. So, I changed the wording -- and was then criticised for not mentioning FreeBSD's performance. Some days, you just can't win.

Suffice it to say that the world's record for maximum 24-hour performance for an ftp server is held by a FreeBSD box (wcarchive.cdrom.com, at now-defunct Walnut Creek CD-ROM, Inc.): 1.39 terabytes of data served to the public in a single day by a single Pentium II box, on Sunday, May 23, 1999. Each time this record has been broken, for years, it's been by FreeBSD, and always breaking a record previously set by a FreeBSD box.

It should also be mentioned that FreeBSD is increasingly portable to non-Intel CPU architectures, notably DEC Alpha and (more recently) Power PC.

[20] In fairness, it's also relevant that the CERN HTTP daemon was for quite a while the only such software extant.

Its licensing history is more than a little fuzzy, since source archives are available at http://www.w3.org/Daemon/ only of the final v. 3.0A version (July 23, 1996). That version includes an MIT copyright/licence notice resembling the MIT/X Consortium licence, plus a requirement that any products based on the codebase credit CERN. Those interested in researching the full licence history will probably find sufficient records within the CERN W3 Project's historical archives at http://www.w3.org/History/.

[21] Some correspondents have disputed this assertion, on the basis of their own fading recollection. I was speaking on the basis of licence terms for the final version (1.5.2a, released 1995) I found at http://hoohoo.ncsa.uiuc.edu/docs/COPYRIGHT.html; they're valid for "academic, research and internal business purposes only". Brian Behlendorf of the Apache Group corresponded with me in e-mail about the above-cited terms, and clarified: "That is not the license that was on NCSA [httpd] 1.3. That license said, 'This code is in the public domain, and give us credit'."

Upon examination of the source archives at ftp://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/, it appears that NCSA's switch to proprietary licensing occurred between June 22, 1995's v. 1.4.2 and March 21, 1996's v. 1.5c, without any mention in the documentation. Version 1.4.2 included the following statement in its README file, matching Behlendorf's recollection: "This code is in the public domain. Specifically, we give to the public domain all rights for future licensing of the source code, all resale rights, and all publishing rights. We ask, but do not require, that the following message be included in all derived works: Portions developed at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. THE UNIVERSITY OF ILLINOIS GIVES NO WARRANTY, EXPRESSED OR IMPLIED, FOR THE SOFTWARE AND/OR DOCUMENTATION PROVIDED, INCLUDING, WITHOUT LIMITATION, WARRANTY OF MERCHANTABILITY AND WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE."

[22] No offence whatsoever is intended by this phrasing to various actual project leads including Richard Kenner (for v. 2.5.x through 2.8.x) and the GCC (née EGCS) Steering Committee (for later versions). As with (almost) all FSF projects, it is produced by volunteers under FSF imprimatur.

My initial account of the Pentium Compiler Group's history was based on what had been posted on their now-vanished development Web site, and inadvertently annoyed some PCG participants from Intel by not citing their role concerning pgcc.

Some time in the year 2001, the GCC acronym was retroactively re-construed to mean GNU Compiler Collection, recognising the addition of compilers for numerous other languages (beyond C), including Java.

[23] No offence whatsoever is intended by this term to project lead programmer Ulrich Drepper and other volunteers, who do all the work, with FSF giving their imprimatur to it.

[24] Again, I mean this solely in the figurative sense of it being "Stallman's" in the sense of him being the founder, head, and motive spirit of FSF, which is the sponsoring organisational umbrella for the GNU libc effort. Ulrich Drepper and other volunteers should be credited for the actual coding and debugging work.

[25] Richard Stallman sent me a much better and more-lucid explanation: "We wanted to merge in their changes, but the person who maintained that version didn't want to explain the changes, write change logs, or even sort out which changes had come from which authors so we could get legal papers. But we considered support for GNU/Linux important, so the FSF hired a person to redo that work. Then we came out with GNU libc 2.0, which worked in GNU/Linux 'out of the box'." For his part, Ulrich Drepper, the project lead, asserts that FSF paid someone only to work on glibc for the HURD kernel, and explicitly forbade that person to work on Linux support. Drepper adds that FSF never to his knowledge supported the work on libc for Linux in the first place, and in fact actively pressured him to work on HURD support instead of Linux support.

[26] An experienced Sybase programmer subsequently wrote to me with the following: "In case no one has yet clarified the case about the fork between Sybase and Microsoft's SQL Server, SQL [Server] 7 is indeed a complete rewrite from the ground up. Microsoft finally outgrew the Sybase code (which considering its age held up pretty well IMO). All versions before that (including 6.5 - probably still installed at more places than it should be) are merely the forked versions of Sybase."

[27] In fact, an example of just that has already existed for quite some time, in the Embedded Linux Kernel Subset (ELKS) project (http://www.elks.ecs.soton.ac.uk/), a highly modified Linux variant that runs on the original IBM Personal Computer PC, XT, and compatibles, e.g., on antique 8086- and 8088-class machines that, lacking memory-protection, cannot run real, full-fledged Unix-like operating systems. Also, in the medium term, Linux ports to some of the more difficult CPU architectures, including even the Power PC, are resynchronised rarely to Torvalds's kernels, and might be seen as at least temporary forks. Ditto Alan Cox's series of patches, which can be seen as a parallel development path to the main kernel series.

Last, Nick Moffitt has pointed out another ongoing cause of temporary forks, brought to his attention by his own GAR software-build/packaging system: the very act of packaging software for a particular OS or distribution. Inevitably, non-portable changes must be made to some packages to make them install and function on various distributions and OS platforms.


Copyright (C) 1999-2007, Rick Moen.
Verbatim copying and redistribution of this entire article are permitted in any medium provided this notice is preserved.

Last modified: June 19, 2007
(Note: Please don't request additions/corrections beyond making the piece correct a/o my original 14 Nov 1999 publication date. This essay is not a newspaper.)

Rick Moen
rick@linuxmafia.com