Apache - An Overview
by Rick MoenRevised: Wednesday, 2001-11-21
Master version will be at
http://linuxmafia.com/faq/Web/apache-lecture.html
and I'll try to improve it there.
The original http daemon was Centre Européenne pour la
Recherche Nucléaire's CERN httpd, released 1991. Two
years later, Marc Andreessen at National Center for
Supercomputing Applications (NCSA) at the University of
Illinois released the Mosaic Web browser, starting the Web
boom, and Rob McCool at NCSA released NCSA httpd about the same
time.[1]
The two server packages were under what would now be called open-source licences; NCSA's httpd rapidly became much more popular than CERN's because of its performance advantage. (Mosaic was always under a gratis-for-noncommercial-usage-only licence.)
The following year (1994), though, both Andreessen and McCool matriculated from U. of Illinois and left their respective projects, which began to stagnate. Andreessen and McCool co-founded Mosaic Communications, Inc., to develop what was initially named the "Mozilla" Web browser (implying better than Mosaic). Subsequently, the firm was obliged to rename itself to Netscape Communications after NCSA complained of trademark infringement, and rename the browser to Netscape Navigator. However, the "Mozilla" gag lived on in the final line of every version's README: "And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced Mozilla."
Somewhat surprisingly, in late 1995, McCool rejoined the NCSA httpd project indirectly, by joining an on-line community of programmers who had independently revived his codebase, which was somewhat rudderless at NCSA at that time -- even though he was working on Netscape's proprietary competitor, Netscape Commerce Server. The group and project were initially nameless, but, when they decided in February 1995 on a central mailing list (hosted initially on group member Brian Behlendorf's server machine), they had named it "Apache" after Behlendorf's name for one of the post-1.3 betas. Only then did the group realise the inadvertant pun they'd serendipitously carried out: Their version was "a patchy" version of NCSA httpd 1.3.
Eventually NCSA released open-source httpd version 1.4.2, and then the Board of Regents saw the prospect of profit, and imposed a proprietary licence starting with v. 1.5. The project immediately went stagnant and died.
Initial Apache group members were Behlendorf, Robert S. Thau, Rob Hartill, Randy Terbush, David Robinson, Cliff Skolnick, Andrew Wilson, and Roy T. Fielding. McCool rejoined the effort afterwards. Uniquely among the major open-source projects, the Apache Group makes all decisions by a voting/consensus model over e-mail [2], and keeps source code under CVS source control (starting in 1996). There's no other hierarchy or formal structure (except that in 1999 it created an umbrella non-profit corporation, Apache Software Foundation).
In 1998, Apache received a major boost in the form of ongoing code contributions from IBM (including multithread support for the future 2.0 release, and better MS-Windows NT support), which had decided to discontinue its own proprietary Domino Go web server package -- and in fact had adopted Apache firm-wide (as "IBM HTTP Server").[3] Two IBM developers (Manoj Kasichainula and Bill Stoddard) became full members of the Apache Group, at that time -- with two others also contributing.
Apache and its derivatives now run more than 60% of Internet web domains (Source: Netcraft, May, 1999).
Feature Set:
- Pre-forking HTTP 1.1 server -- runs standalone (inetd is
supported but deprecated)
- Runs by default on port 80, but any others can be used
instead or in addition
- Virtual host support -- both old-style IP-based and HTTP
v. 1.1-style "NameVirtualHost"
- Cross-platform (Unixes, Win32, OS/2, BeOS, MacOS,
NetWare, BS2000/OSD on System/390)
- CGI gateway interface (initially called "htbin" at NCSA)
for calling external processes -- at the cost of a
fork-and-exec
- Modular architecture (DSOs = Dynamic Shared Objects) in
addition to CGI -- run inside the main Apache process, saving
fork-and-exec cost, dynamically loadable [4] without
recompiling [5]
- Also supports NSAPI (Netscape Server Application Program
Interface)
- Server side includes (SSIs) = embedded program code. PHP
is a special example of this, implemented in SourceForge as
an Apache DSO.
- Can authenticate to LDAP, Berkeley DB, NIS
- Supports custom error-handlers
- Content negotiation -- sending the browser the language
or type of document it says it can best deal with
Configuration:
In the simplest sense, Apache knows via httpd.conf [6] about one or more locations ("Document Root" and other directories appended to it) of HTML and similar documents that it. More complex configurations allow SSIs -- and extended SSI-like structures such as PHP, parsed before sending out the resulting HTML / other output.
Authentication:
- Default is regular anonymous access.
- "AuthType Basic" imposes a per-directory plaintext-transmitted password, specified in .htpasswd (/etc/passwd format) that is referenced from the directory's .htaccess file.
- Better is SSL -- which had patent and export-law problems.
Legal issues:
Patent issue: SSL (Secure Sockets Layer) aka TLS (Transport Level Security) was under the RSA patent until 2000-09-21.
Export-law issue: USA Commerce Dept. imposed export regulations, but relaxed them for open-source crypto, as of late 2000.
Pre-late-2000, in the USA, you thus were legally obligated to use an Apache SSL add-in from a company that had paid the RSA patent royalty. Those were:
- Covalent Technologies, Inc.'s Raven SSL, http://www.covalent.net/
- C2Net Software's Stronghold, http://www.c2.net/products/sh2/ (C2Net has now been merged into Red Hat Software, Inc.)
- Red Hat Secure Web Server (this was a licensed version of C2Net's software)
You could alternatively solve the functionality problem either of two other ways, but these were not legal in the USA on account of patent issues:
- Load the mod_ssl DSO. Supports SSLv2 and SSLv3.
- Use Apache SSL, a Europe-based fork of Apache to get around the patent and USA-export problems.
These days, pretty much everyone just enables mod_ssl.
Detail on Virtual Hosting:
This feature is one of Apache's most-valuable features (and is used, for example, to generate SourceForge's project Web pages). A single Apache server can serve up many different document roots, when the Web server answers queries for many separate hostnames.
The older, works-on-every-Web-browser method, is IP-based virtual hosting. In this, the underlying OS is configured to answer to queries for many individual IP addresses. (In Linux, one uses IP aliasing, assigning IP addresses to virtual ethernet devices like eth0:0, eth0:1, etc.) Advantage: Even the AOL 3.0 browser will work with this. Disadvantage: It chews up IP addresses.
The modern way is with the NameVirtualHost directive in httpd.conf . In this method, Apache serves up different document trees depending not on the IP address reached, but rather on the destination hostname specified in the client connection's HTTP header. This method works with any Web browser capable of supporting HTTP version 1.1 -- which is all but the most ancient and long-discontinued browsers, at this late date.
The second method is obviously dependent on correct and authoritative forward DNS setup for the domains being served. This is why SourceForge's "project Web" feature in practice requires that the customer delegate SourceForge's DNS subdomain to a nameserver process running on one of the SourceForge boxes.
Example excerpt from the SourceForge 3.0 httpd.conf file:
LoadModule env_module libexec/mod_env.so LoadModule config_log_module libexec/mod_log_config.so LoadModule mime_magic_module libexec/mod_mime_magic.so LoadModule mime_module libexec/mod_mime.so LoadModule negotiation_module libexec/mod_negotiation.so LoadModule status_module libexec/mod_status.so LoadModule dir_module libexec/mod_dir.so LoadModule alias_module libexec/mod_alias.so LoadModule access_module libexec/mod_access.so LoadModule auth_module libexec/mod_auth.so LoadModule setenvif_module libexec/mod_setenvif.so LoadModule php4_module libexec/libphp4.so LoadModule ssl_module libexec/libssl.so LoadModule cgi_module libexec/mod_cgi.so LoadModule php4_module libexec/libphp4.so ClearModuleList AddModule mod_env.c AddModule mod_log_config.c AddModule mod_mime_magic.c AddModule mod_mime.c AddModule mod_negotiation.c AddModule mod_status.c AddModule mod_dir.c AddModule mod_access.c AddModule mod_auth.c AddModule mod_so.c AddModule mod_setenvif.c AddModule mod_php4.c AddModule mod_ssl.c AddModule mod_alias.c AddModule mod_cgi.c ServerType standalone ServerRoot "/sourceforge/apache" DocumentRoot "/sourceforge/sfee/www" ServerName mutt.linuxmafia.com ServerAdmin rmoen@mutt.linuxmafia.com PidFile /sourceforge/var/run/httpd.pid ScoreBoardFile /sourceforge/log/apache/httpd.scoreboard Timeout 300 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15 MinSpareServers 5 MaxSpareServers 10 StartServers 20 MaxClients 256 MaxRequestsPerChild 10000 Port 443 Listen 80 Listen 443 User sf-httpd Group sf-httpd HostnameLookups Off UseCanonicalName Off LogLevel warn ServerSignature On AddType application/x-httpd-php .php MIMEMagicFile /sourceforge/etc/apache/magic TypesConfig /sourceforge/etc/apache/mime.types DefaultType text/plain DirectoryIndex index.html index.php index.cgi <IfDefine SFEE> Alias /download/ /sourceforge/var/frs_files/ <Directory "/sourceforge/var/frs_files/"> SetEnvIf Remote_Addr "." sfee=1 </Directory> <Directory "/sourceforge/sfee/www"> <Files projects> ForceType application/x-httpd-php </Files> <Files users> ForceType application/x-httpd-php </Files> <Files docs> ForceType application/x-httpd-php </Files> <Files foundry> ForceType application/x-httpd-php </Files> SetEnvIf Remote_Addr "." sfee=1 </Directory> <Directory "/sourceforge/sfee/www/pm/reporting"> <Files category_tasks.png> ForceType application/x-httpd-php </Files> <Files closed_tasks.png> ForceType application/x-httpd-php </Files> <Files incomplete_tasks.png> ForceType application/x-httpd-php </Files> <Files started_tasks.png> ForceType application/x-httpd-php </Files> <Files technician_tasks.png> ForceType application/x-httpd-php </Files> </Directory> <Directory "/sourceforge/sfee/www/project/stats"> <Files stats_graph.png> ForceType application/x-httpd-php </Files> </Directory> <Directory "/sourceforge/sfee/www/stats"> <Files users_graph.png> ForceType application/x-httpd-php </Files> <Files views_graph.png> ForceType application/x-httpd-php </Files> <Files weekly_views.png> ForceType application/x-httpd-php </Files> </Directory> <Directory "/sourceforge/sfee/www/tracker/reporting"> <Files aging_report.png> ForceType application/x-httpd-php </Files> <Files distribution_report.png> ForceType application/x-httpd-php </Files> </Directory> </IfDefine> <IfDefine LIST> Alias /mailman/archives/ /sourceforge/mailman/archives/public/ ScriptAlias /mailman/ /sourceforge/mailman/cgi-bin/ <Directory "/sourceforge/mailman"> Options ExecCGI FollowSymLinks AllowOverride None Order allow,deny Allow from all SetEnvIf Remote_Addr "." lists=1 </Directory> </IfDefine> <IfDefine CVS> # chora temporarily in sourceforge/sfee/www/horde/chora # note also changes to paths in the Directory block below # rmoen 2001.10.24 # # Alias /horde/chora/ "/sourceforge/chora/horde/chora/" Alias /icons/ "/sourceforge/apache/icons/" <Directory "/sourceforge/sfee/www/horde"> php_flag magic_quotes_gpc off php_flag short_open_tag on Options ExecCGI FollowSymLinks AllowOverride None Order allow,deny Allow from all SetEnvIf Remote_Addr "." cvs=1 </Directory> </IfDefine> <VirtualHost _default_:443> SSLEngine on SSLCertificateFile /sourceforge/etc/apache/ssl.crt/server.crt SSLCertificateKeyFile /sourceforge/etc/apache/ssl.key/server.key </VirtualHost>
Security:
Apache initially starts as the root user (because it needs to bind to privileged port 80/tcp). It then immediately forks off a configurable number of processes that run as an unprivileged user, which handle answering of incoming requests.
Log, conf, and binary directories (and their parents) should be root-owned, permissions 755.
Ideally, cgi-bin directories should not be remotely readable. Confine CGI scripts to that directory. (Only trusted users should be allowed to write/modify CGIs.)
Set "AllowOverride None" and "Options None" as directory defaults. Otherwise, users can override some system security settings with .htaccess files.
Performance:
Apache 1.3.x now defaults to disabling DNS lookups on all connections. Don't do "HostnameLookups on" unless you can stand greatly increased DNS overhead.
Loglevel: Set the verbosity of this to one of the lower levels (such as error or crit). Extremely laconic levels (alert, emerg) are unwise.
The FollowSymLinks directive costs processor time. Favour SymLinksIfOwnerMatch instead, to minimise the cost.
Content negotiation such as "DirectoryIndex index*" is
costly. Instead, use a complete list of the expected
matches:
DirectoryIndex index.cgi index.pl index.shtml index.html
Leave MinSpareServers, MaxSpareServers and StartServers at their default unless you have a really good reason. Also KeepAliveTimeout (15 seconds default).
Troubleshooting:
Check the error log, first.
If you've made serious errors in httpd.conf, Apache may die silently upon start. These can be hard to track down via normal means: You should in that case run the bare httpd binary from the command line (without using the SysVInit scripts), in order to prevent suppression of stdout and stderr output -- and see such output on the console. (Make sure you're the root user.)
Further Reading::
"The Apache Story", article in Linux Magazine by Rob McCool,
Roy T. Fielding, and Brian Behlendorf
http://www.linux-mag.com/1999-06/apache_01.html
"Running a Web Server under Linux", article in Linux
Magazine by Jim Dennis
http://www.linux-mag.com/1999-06/guru_01.html
"IBM and Apache Plan Their First Date", slide presentation
by Manoj Kasichainula
http://www.io.com/~manojk/ibmapache/
Apache v. 1.3.x Documentation (only brave people so far run
2.0 betas)
http://httpd.apache.org/docs/
Apache FAQ
http://httpd.apache.org/docs/misc/FAQ.html
Notes:
[1] Andreessen and McCool invented for Mosiac and NCSA httpd the concept of URLs, previously unknown in CERN's software.
[2] Each project member can cast one vote, +1 or -1. Changing the codebase requires at least three plus votes and no minus ones. Other actions require at least three plus votes and an overall positive majority on votes cast.
[3] IBM's Java-oriented WebSphere product is also based on Apache.
[4] To my knowledge, you cannot unload an Apache DSO after runtime.
[5] There's a vast variety of modules provided in the stock Apache 1.3.x package, including mod_speling (yes, spelled that way), which detects and corrects common misspellings in URLs, and mod_mime_magic, which autodetects documents' correct MIME types using Unix "magic numbers" and other hints from the filesystem (instead of just filename extensions0. Also mod_rewrite, which has unlimited ability to rewrite URLs in accordance with regular expressions. Also mod_proxy, providing much of the Squid Web cache's feature set, right inside Apache.
[6] On Red Hat, this would be in /etc/httpd/conf/ . On Debian, it would be in /etc/apache/ . You will also find some mentions of srm.conf and access.conf: What is now solely in Apache's httpd.conf file used to be split among all three files, for backwards compatibility with NCSA httpd. Recent Apache versions finally abandon that unnecessary complexity.
The same directory will also contain Apache's mime.types file, technically also part of Apache's configuration set.