BayLISA meeting notes
17 September 1998
by Rick Moen
(BayLISA is the San Francisco Bay Area branch of the nation-wide group LISA -- Large Installation Systems Administration -- and has its Web page at http://www.baylisa.org/ .)
The panel was in the form of 10-minute presentations by each of
the panelists, in turn, followed by a collective Q&A
session Some of them had a tendency to use overhead foils with
tiny, illegible print, the information in which was totally
wasted, under the circumstances. Suggestion to presenters: If
you use visual aids, stick to larger point sizes, and give the
audience a URL where they can look up your presentation
materials.
[Late note: Bryan McDonald of the BayLISA Board informs me that we're trying to get the overhead slides for Web presentation, and this will be announced when ready.]
[Even later note: No sign of the slides, after several months.]
To recap, this was to be a panel on how to set up/configure high-reliability, high-performance Internet servers on sundry Intel Unix/Unix-like OSes.
1. Paul Vixie (Internet Software Consortium founder & much
more, re: BSDI's BSD OS): Likes all of {Free|Net|Open}BSD, but
prefers BSD OS because it _doesn't_ change often. He always
uses the last patch level of the prior major release, to
maximise stability. For example, BSDI has now come out with
4.0, so he runs the most recent (final) patches of 3.1. He has
several machines running routed, gated, and screend in Palo
Alto, and some others running the T1 and doing kerberos
authentication.
He _doesn't_, however, run BSD OS for the root nameserver. This requires 1GB of RAM, which he figures he could easily accomplish on NetBSD, but instead runs it on an Alpha running Digital Unix, because Digital was kind enough to donate these.
He made reference to his fabled page (http://www.vix.com/pc-hw/ ) of hardware recommendations for BSD OS, which has been much used by BSDI and others.
2. Jason Thorpe (NetBSD kernel developer). Says you really need
256 MB of RAM. Likes the Mylex/Buslogic MultiMaster BT-958 SCSI
host adapter. Adaptec 2940U is OK. Recommends avoiding Adaptec
AIC-7890 chipsets, since the support isn't quite there, yet.
(An audience member piped in that the "unstable" tree's driver
does this OK.) Uses Seagate Hawk drives because they don't
catch on fire (no doubt referencing Seagate's "hot offering",
the 10,000 RPM Barracuda series). Likes Digital DEFPA FDDI
adapters, Bay Networks Netgear 10/100 ethernet NICs ($30 at
Fry's), which are based on the DEC Tulip chipset. [RM adds:
Beware! Very recently, NetGear has started shipping units with
its own chipsets that almost but not quite emulate DEC's,
without changing the model number or S/N series.] Intel
EtherExpress 10/100 NICs are OK, too.
Always stripes drives, and uses multiple SCSI host adapters. Likes serial consoles -- remote recovery.
NetBSD has a "packages" system similar to Jordan K. Hubbard's "ports" system in FreeBSD. (They couldn't call it "ports" because that term already has a defined meaning.)
He listed a large number of daemons & network services for NetBSD on barely-legible overhead foils, there & gone too quickly to take notes, and then concluded by discussing NetBSD tuning, which I did not attempt to transcribe.
3. Matt Dillon (one of the founders of Best Internet, sometime
Linux guy, member of the FreeBSD core team): Best has 45
FreeBSD rack-mount hosts in production service. They tried, at
first, to use a couple of SGI hosts, which didn't work out.
Those cost $5 million, and were replaced by the 45 PCs costing
$200,000. These are twice as efficient and twice as fast. Their
first Intel production systems, back in the SGI days, were
Pentium 90 motherboards, but they didn't find these to be
robust. The eventual Pentium Pro-based systems were a major
improvement, with ECC support and other improvements.
Eventually replaced the SGIs with ASUS Pentium Pro 200, ECC RAM
systems that could hold 256 MB RAM, maximum.
This system has serial console mode with kernel debugger. Found he didn't need multiple SCSI cards per host: ISPs are seek-limited, and don't push the 135MB/sec bandwidth of PCI. Tagged queueing and disconnect help user-concurrency issues. (There's no way this can be done with IDE.)
In FreeBSD 3.0 (beta), the CAM layer fixes a vexing 2940UW hardware-FIFO bug; uses DMA, avoids FIFO. 3.0 queues more SCBs (SCSI Control Blocks) per host adapter: uses 16-20 per host adapter (real-world conditions), instead of maximum of 4 in 2.2.x. Elevator algorithm was helpful for a long time (historically), but modern SCSI disks don't necessarily store sectors in rotation order, so it's no longer useful. SCBs therefore are of help in that area.
Likes Adaptec, Symbios/NCR, Mylex/Buslogic SCSI hosts, and almost any 10/100 ethernet: DEC, Intel ethernet chipsets. Uses PCI only (avoid VLB & ISA like the plague, EISA mostly obsolete), ECC SIMMs. Uses _some_ new Pentium IIs, finds CPU performance unimpressive, because of slow L2 cache, but at least they have the advantage of holding more RAM and having a greater number of sockets, so that you can have more RAM on a system without having to use highest-density SIMMs/DIMMs.
Doesn't use NFS, no common file store, no RAID; one 100Base-T network. Failures are rare, and only the occasional disk failures are serious: In 1-2 cases, they required restoring from backup. Ethernet goes to a Cisco Catalyst switch. FreeBSD has somewhat higher CPU overhead than NetBSD. Having a 100Base-T backbone helps protect against denial-of-service attacks such as smurfing, which occurs a couple of times per week.
100Base-T also helps with tape backup. The tape machine is the only machine having two SCSI cards, in order that the tape chain be separate from the disk one. This is also one of only two RAID 0 machines. He's using Diablo (his own feed-only news server -- still in late beta) for netnews feed, on an SMP dual-processor machine. Using three 18-GB disk drives in striped configuration, with soft updates enabled on the filesystem. This is a test machine for (among other things) FreeBSD beta, so it's running all possible experimental FreeBSD code. System configuration/tuning is similar to NetBSD, but has a totally different virtual memory system (which he detailed).
4. Jim Dennis (re: Linux -- "Linux Answer Guy" columnist in
Linux Gazette): Has used Linux since late 1991. [RM adds: Linux
Torvalds put out the first Linux kernels for public ftp in
spring 1991.] He used Coherent before that. [RM adds: Yet
another small Unix-like OS, a low-cost proprietary offering
from the Mark Williams Company, now defunct.] Before that,
worked at Quarterdeck supporting DesqView on DOS.
Linux evolves rapidly, but it's not necessary to evolve with it. (Anecdotes about machines running older Linux builds, with very long uptimes.) Typical uptime on his personal system is 3-4 months, until he wants to change something fundamental, usually compiling a new kernel. Production systems, by contrast, have longer continuous uptimes (but his point is that it's reliable enough that uptime measurements aren't significant). Makes reference to the High Availability HOWTO document at the Linux Documentation Project (LDP): http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
Describes "Beowulf" clustering -- and stresses that its methods should run on any *ix kernel, although it's most often been implemented on Linux. Moving to more down-to-earth concerns: system monitoring, capacity planning, alerts: Round-robin DNS and traditional redundancies via MX records, NIS master/slave setups, etc., will help in those areas. Some new protocols, such as the Coda distributed filesystem, will help in the future. (Hopes that Coda will replace NFS.)
High-availability configurations are somewhat exotic, and Beowulf clustering does not benefit most most common business applications and services: Calculation-intensive applications are rare -- astronomy and particle-physics simulations, rather than Web servers.
Linux 2.1.x development kernels have been underway for a very long time: 1.5 years. (Described kernel-build process.)
Hardware requirements: Will run on anything: 386 w/16 MB RAM suffices, e.g., for router. One fellow built a system with 2.1.x kernel on 4 MB RAM. Linux can also be run from a single floppy without hard drive: References Tom Oehser's "Tom's Root/Boot" floppy, and the Linux Router Project.
Any old PC piece o'junk will run Linux; for driver support on new (recently introduced) hardware, check the Red Hat or LDP hardware compatibility lists. Can get Linux preinstalled from many firms in the Valley and elsewhere (PromoX, VA Research...). Likes DEC Tulip ethernet chipset, especially on NetGear cards. Likes ASUS SMP motherboards, Mylex/Buslogic BT-958 SCSI host adapters, watchdog timer hardware (see below).
Recommends using just one SCSI host adapter, except for tape drives. Likes new LM78 chipset, which monitors fan speed, voltages, and temperatures inside the case. The WDT500-P and WDT501-P watchdog timer interface hardware from Industrial Computer Source (San Diego) is supported in the kernel (which provides /dev/watchdog), and can reboot or shut down the machine if it overheats, has voltage problems, or fails other health checks.
Linux gotchas: There are about a dozen distributions, at any given time. (Names a few.) They differ in little persnickety details. Recommends picking one, installing it once (for each machine-role profile, e.g., Web, ftp, router, fileserver, and workstation), using the completed installation as a template to crank out others.
PC BIOS and hardware constraints are frustrating. Multi-OS boot setups are the source of a high percentage of the questions he's asked. Packages aren't as well integrated as in FreeBSD. Performance tuning: "no atime" mount parameter on filesystems can double disk throughput (which may not quite double throughput, but offers a very noticeable speed-up). Linux's native ext2 filesystem is very fast, even without that. Linux supports a number of other filesystems, and experimental ones are being developed for special purposes. Recommends running ntpdate at startup and xntpd during operation for extremely accurate clock synchronisation.
Recommends checking "freshmeat.net" frequently, since a dozen or more new or updated packages are posted there every day -- regardless of which OS you use, since most code posted there is portable. To be added to Linux, soon: journaling filesystem, ACLs (much else that I couldn't copy down).
5. Bob Palowoda (Solaris Performance Expert at Sun
Microsystems, re: Solaris x86): As part of his job, tests Xeon,
Merced (which he can't talk about), Intel BX motherboards on
Solaris vs. other Unixes. Says that x86 hardware now rivals the
SPARC systems for performance. Classes "small" servers as dual
Pentium Pros, "medium" as dual PII/400, "high-end" as quad-Xeon
systems, e.g., NCR, Siemens, Fujitsu. NCR has fastest spec Web
server in the world, surpassing even a large IBM box in tests.
Solaris has good, fine-grained SMP, but there is a shortage of
programmers able to take advantage of it. Compares Solaris
against *BSD and Linux on performance frequently as part of his
job.
[Presents a series of illegible overhead foils with tiny type, purporting to show that Solaris's internal memory transfer rates, local transfer rates, some other measures are faster than the competition's.]
Disk drive tuning: Transfer rates on UFS using Seagate Cheetahs, 15 MB/sec. Using IBMs, 12 MB/sec. Compares against Linux on pthread creation time, shows Linux to be much slower. Linux and FreeBSD do not yet have a well-developed SMP system. Solaris now costs $20 for personal use, which he personally resents, since he doesn't think it appropriate to do that with the developers' work.
Tuning parameters: Solaris dynamically tunes a lot of them. (Details many others.) Recommends Mylex host adapters w/3-5 channels and on-board cache RAM. I20 bus is now supported on Solaris 2.7. Veritas journaling filesystem has just been ported, and is very fast, but has been observed to flake out a few times. (Logging is now supported on UFS.)
Describes Solaris for ISP bundle, with sundry packages including HighWind news server, SSL, Java-based administration tools, LDAP, both Andrew2 and WU IMAP servers, GSSAPI authentication, which can deal both with real kerberos and Microsoft-gimmicked kerberos.
Joint Q&A session:
What is Best's failure rate on the x86 FreeBSD boxes? Only occasional drive failures. Ethernet cards, pre-ECC RAM occasionally. A few power supplies, one fan, little things. Everything except disk drives can be recovered from essentially instantly. (Modular rack-mount design.) Cheap PC parts allow keeping spares of everything around -- not quite as reliable as SPARC.
Which filesystems have 64-bit support? Solaris UFS has it, can be up to 8 terabytes. Jason reports similar results on NetBSD. Matt: fsck can take a long time on UFS, performance w/soft updates is competitive and fixes the fsck problem. Jim: Linux can handle very large filesystems on 64-bit CPU systems, eg., DEC Alpha. On 32-bit systems, maximum per volume is 2 GB.
When will there be serial console for Linux? Jim: Already available as patches for 2 years; built-in code in the 2.1.x development kernels. (Solaris has it.) A company in Canada, Canada Connect, is developing add-on hardware to do _full_ serial-console, including BIOS Setup access, e.g., Adaptec Ctrl-A. Product is called "PC Weasel 2000" (?).
On Solaris, any chance of retrofitting support for new hardware into version 2.5? No, and 2.5 will soon be totally unsupported.
Each of the panelists represents a different approach to source-code licencing. Could each of you describe the advantages of your licence arrangements, and explain why you feel it's the right path for the future? [The moderator whapped this person with a clue stick, and disallowed the question.]
Question about rack-mounting, which I didn't quite manage to transcribe: Matt stressed that rack-mounting allows reliable, short ultra-wide SCSI cabling, and good cooling for the disk drives, both of which he considers key for reliable operation.
Is there an option to do remote machine builds? Matt: At Best, we install from a template machine using an NFS-supporting boot floppy. Jim: Linux boot floppies can do NFS-client/bootp without modification. (Can also do remote tape/other storage access via rsh and tar or cpio.)
How do the various OSes handle Y2K? Sun has a certification program. NetBSD 1.2 (latest) has been somewhat tested, no reports of new problems since then. No known problems in 1.3.2 beta. Linux: As usual, it depends. Pieces come from all manner of origins. All the GNU utilities have been fixed. The kernel doesn't have a problem. Everything else is app-dependent. Matt: (Describes the fact that machines run ultra-accurate NTP, for some reason.) Seems to say that there are no known problems. BSDI is eyeballing problem spots, and is now certifying BSD OS for insurance purposes. Real-time clocks in PC hardware are often unfixably defective for Y2K purposes.
(Discussion about UPSes and recovery from extended power failures.)
This was a very long meeting, running to about 10:30 pm. About
sixteen of us then adjourned to the Peppermill restaurant.