Apache - An Overview

by Rick Moen

Revised: Wednesday, 2001-11-21

Master version will be at
http://linuxmafia.com/faq/Web/apache-lecture.html
and I'll try to improve it there.


The original http daemon was Centre Européenne pour la Recherche Nucléaire's CERN httpd, released 1991. Two years later, Marc Andreessen at National Center for Supercomputing Applications (NCSA) at the University of Illinois released the Mosaic Web browser, starting the Web boom, and Rob McCool at NCSA released NCSA httpd about the same time.[1]

The two server packages were under what would now be called open-source licences; NCSA's httpd rapidly became much more popular than CERN's because of its performance advantage. (Mosaic was always under a gratis-for-noncommercial-usage-only licence.)

The following year (1994), though, both Andreessen and McCool matriculated from U. of Illinois and left their respective projects, which began to stagnate. Andreessen and McCool co-founded Mosaic Communications, Inc., to develop what was initially named the "Mozilla" Web browser (implying better than Mosaic). Subsequently, the firm was obliged to rename itself to Netscape Communications after NCSA complained of trademark infringement, and rename the browser to Netscape Navigator. However, the "Mozilla" gag lived on in the final line of every version's README: "And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced Mozilla."

Somewhat surprisingly, in late 1995, McCool rejoined the NCSA httpd project indirectly, by joining an on-line community of programmers who had independently revived his codebase, which was somewhat rudderless at NCSA at that time -- even though he was working on Netscape's proprietary competitor, Netscape Commerce Server. The group and project were initially nameless, but, when they decided in February 1995 on a central mailing list (hosted initially on group member Brian Behlendorf's server machine), they had named it "Apache" after Behlendorf's name for one of the post-1.3 betas. Only then did the group realise the inadvertant pun they'd serendipitously carried out: Their version was "a patchy" version of NCSA httpd 1.3.

Eventually NCSA released open-source httpd version 1.4.2, and then the Board of Regents saw the prospect of profit, and imposed a proprietary licence starting with v. 1.5. The project immediately went stagnant and died.

Initial Apache group members were Behlendorf, Robert S. Thau, Rob Hartill, Randy Terbush, David Robinson, Cliff Skolnick, Andrew Wilson, and Roy T. Fielding. McCool rejoined the effort afterwards. Uniquely among the major open-source projects, the Apache Group makes all decisions by a voting/consensus model over e-mail [2], and keeps source code under CVS source control (starting in 1996). There's no other hierarchy or formal structure (except that in 1999 it created an umbrella non-profit corporation, Apache Software Foundation).

In 1998, Apache received a major boost in the form of ongoing code contributions from IBM (including multithread support for the future 2.0 release, and better MS-Windows NT support), which had decided to discontinue its own proprietary Domino Go web server package -- and in fact had adopted Apache firm-wide (as "IBM HTTP Server").[3] Two IBM developers (Manoj Kasichainula and Bill Stoddard) became full members of the Apache Group, at that time -- with two others also contributing.

Apache and its derivatives now run more than 60% of Internet web domains (Source: Netcraft, May, 1999).


Feature Set:


Configuration:

In the simplest sense, Apache knows via httpd.conf [6] about one or more locations ("Document Root" and other directories appended to it) of HTML and similar documents that it. More complex configurations allow SSIs -- and extended SSI-like structures such as PHP, parsed before sending out the resulting HTML / other output.


Authentication:


Legal issues:

Patent issue: SSL (Secure Sockets Layer) aka TLS (Transport Level Security) was under the RSA patent until 2000-09-21.

Export-law issue: USA Commerce Dept. imposed export regulations, but relaxed them for open-source crypto, as of late 2000.

Pre-late-2000, in the USA, you thus were legally obligated to use an Apache SSL add-in from a company that had paid the RSA patent royalty. Those were:

You could alternatively solve the functionality problem either of two other ways, but these were not legal in the USA on account of patent issues:

These days, pretty much everyone just enables mod_ssl.

Detail on Virtual Hosting:

This feature is one of Apache's most-valuable features (and is used, for example, to generate SourceForge's project Web pages). A single Apache server can serve up many different document roots, when the Web server answers queries for many separate hostnames.

The older, works-on-every-Web-browser method, is IP-based virtual hosting. In this, the underlying OS is configured to answer to queries for many individual IP addresses. (In Linux, one uses IP aliasing, assigning IP addresses to virtual ethernet devices like eth0:0, eth0:1, etc.) Advantage: Even the AOL 3.0 browser will work with this. Disadvantage: It chews up IP addresses.

The modern way is with the NameVirtualHost directive in httpd.conf . In this method, Apache serves up different document trees depending not on the IP address reached, but rather on the destination hostname specified in the client connection's HTTP header. This method works with any Web browser capable of supporting HTTP version 1.1 -- which is all but the most ancient and long-discontinued browsers, at this late date.

The second method is obviously dependent on correct and authoritative forward DNS setup for the domains being served. This is why SourceForge's "project Web" feature in practice requires that the customer delegate SourceForge's DNS subdomain to a nameserver process running on one of the SourceForge boxes.


Example excerpt from the SourceForge 3.0 httpd.conf file:

LoadModule env_module           libexec/mod_env.so
LoadModule config_log_module    libexec/mod_log_config.so
LoadModule mime_magic_module    libexec/mod_mime_magic.so
LoadModule mime_module          libexec/mod_mime.so
LoadModule negotiation_module   libexec/mod_negotiation.so
LoadModule status_module        libexec/mod_status.so
LoadModule dir_module           libexec/mod_dir.so
LoadModule alias_module         libexec/mod_alias.so
LoadModule access_module        libexec/mod_access.so
LoadModule auth_module          libexec/mod_auth.so
LoadModule setenvif_module      libexec/mod_setenvif.so
LoadModule php4_module          libexec/libphp4.so
LoadModule ssl_module           libexec/libssl.so
LoadModule cgi_module           libexec/mod_cgi.so
LoadModule php4_module          libexec/libphp4.so

ClearModuleList
AddModule mod_env.c
AddModule mod_log_config.c
AddModule mod_mime_magic.c
AddModule mod_mime.c
AddModule mod_negotiation.c
AddModule mod_status.c
AddModule mod_dir.c
AddModule mod_access.c
AddModule mod_auth.c
AddModule mod_so.c
AddModule mod_setenvif.c
AddModule mod_php4.c
AddModule mod_ssl.c
AddModule mod_alias.c
AddModule mod_cgi.c

ServerType standalone
ServerRoot "/sourceforge/apache"
DocumentRoot "/sourceforge/sfee/www"
ServerName mutt.linuxmafia.com
ServerAdmin rmoen@mutt.linuxmafia.com
PidFile /sourceforge/var/run/httpd.pid
ScoreBoardFile /sourceforge/log/apache/httpd.scoreboard
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15

MinSpareServers 5
MaxSpareServers 10
StartServers 20
MaxClients 256
MaxRequestsPerChild 10000

Port 443
Listen 80
Listen 443
User sf-httpd
Group sf-httpd
HostnameLookups Off
UseCanonicalName Off

LogLevel warn
ServerSignature On
AddType application/x-httpd-php .php
MIMEMagicFile /sourceforge/etc/apache/magic
TypesConfig /sourceforge/etc/apache/mime.types
DefaultType text/plain

DirectoryIndex index.html index.php index.cgi

<IfDefine SFEE>
        Alias /download/ /sourceforge/var/frs_files/
        <Directory "/sourceforge/var/frs_files/">
                SetEnvIf Remote_Addr "." sfee=1
        </Directory>
        <Directory "/sourceforge/sfee/www">
                <Files projects>
                  ForceType application/x-httpd-php
                </Files>
                <Files users>
                  ForceType application/x-httpd-php
                </Files>
                <Files docs>
                  ForceType application/x-httpd-php
                </Files>
                <Files foundry>
                  ForceType application/x-httpd-php
                </Files>
                SetEnvIf Remote_Addr "." sfee=1
        </Directory>
        <Directory "/sourceforge/sfee/www/pm/reporting">
                <Files category_tasks.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files closed_tasks.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files incomplete_tasks.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files started_tasks.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files technician_tasks.png>
                  ForceType application/x-httpd-php
                </Files>
        </Directory>
        <Directory "/sourceforge/sfee/www/project/stats">
                <Files stats_graph.png>
                  ForceType application/x-httpd-php
                </Files>
        </Directory>
        <Directory "/sourceforge/sfee/www/stats">
                <Files users_graph.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files views_graph.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files weekly_views.png>
                  ForceType application/x-httpd-php
                </Files>
        </Directory>
        <Directory "/sourceforge/sfee/www/tracker/reporting">
                <Files aging_report.png>
                  ForceType application/x-httpd-php
                </Files>
                <Files distribution_report.png>
                  ForceType application/x-httpd-php
                </Files>
        </Directory>
</IfDefine>

<IfDefine LIST>
        Alias /mailman/archives/ /sourceforge/mailman/archives/public/
        ScriptAlias /mailman/    /sourceforge/mailman/cgi-bin/
        <Directory "/sourceforge/mailman">
                Options ExecCGI FollowSymLinks
                AllowOverride None
                Order allow,deny
                Allow from all
                SetEnvIf Remote_Addr "." lists=1
        </Directory>
</IfDefine>

<IfDefine CVS>
        # chora temporarily in sourceforge/sfee/www/horde/chora
        # note also changes to paths in the Directory block below
        # rmoen 2001.10.24
        #
        # Alias /horde/chora/ "/sourceforge/chora/horde/chora/"
        Alias /icons/ "/sourceforge/apache/icons/"
        <Directory "/sourceforge/sfee/www/horde">
                php_flag magic_quotes_gpc off
                php_flag short_open_tag on
                Options ExecCGI FollowSymLinks
                AllowOverride None
                Order allow,deny
                Allow from all
                SetEnvIf Remote_Addr "." cvs=1
        </Directory>
</IfDefine>

<VirtualHost _default_:443>
        SSLEngine on
        SSLCertificateFile /sourceforge/etc/apache/ssl.crt/server.crt
        SSLCertificateKeyFile /sourceforge/etc/apache/ssl.key/server.key
</VirtualHost>

Security:

Apache initially starts as the root user (because it needs to bind to privileged port 80/tcp). It then immediately forks off a configurable number of processes that run as an unprivileged user, which handle answering of incoming requests.

Log, conf, and binary directories (and their parents) should be root-owned, permissions 755.

Ideally, cgi-bin directories should not be remotely readable. Confine CGI scripts to that directory. (Only trusted users should be allowed to write/modify CGIs.)

Set "AllowOverride None" and "Options None" as directory defaults. Otherwise, users can override some system security settings with .htaccess files.


Performance:

Apache 1.3.x now defaults to disabling DNS lookups on all connections. Don't do "HostnameLookups on" unless you can stand greatly increased DNS overhead.

Loglevel: Set the verbosity of this to one of the lower levels (such as error or crit). Extremely laconic levels (alert, emerg) are unwise.

The FollowSymLinks directive costs processor time. Favour SymLinksIfOwnerMatch instead, to minimise the cost.

Content negotiation such as "DirectoryIndex index*" is costly. Instead, use a complete list of the expected matches:
DirectoryIndex index.cgi index.pl index.shtml index.html

Leave MinSpareServers, MaxSpareServers and StartServers at their default unless you have a really good reason. Also KeepAliveTimeout (15 seconds default).


Troubleshooting:

Check the error log, first.

If you've made serious errors in httpd.conf, Apache may die silently upon start. These can be hard to track down via normal means: You should in that case run the bare httpd binary from the command line (without using the SysVInit scripts), in order to prevent suppression of stdout and stderr output -- and see such output on the console. (Make sure you're the root user.)


Further Reading::

"The Apache Story", article in Linux Magazine by Rob McCool, Roy T. Fielding, and Brian Behlendorf
http://www.linux-mag.com/1999-06/apache_01.html

"Running a Web Server under Linux", article in Linux Magazine by Jim Dennis
http://www.linux-mag.com/1999-06/guru_01.html

"IBM and Apache Plan Their First Date", slide presentation by Manoj Kasichainula
http://www.io.com/~manojk/ibmapache/

Apache v. 1.3.x Documentation (only brave people so far run 2.0 betas)
http://httpd.apache.org/docs/

Apache FAQ
http://httpd.apache.org/docs/misc/FAQ.html


Notes:

[1] Andreessen and McCool invented for Mosiac and NCSA httpd the concept of URLs, previously unknown in CERN's software.

[2] Each project member can cast one vote, +1 or -1. Changing the codebase requires at least three plus votes and no minus ones. Other actions require at least three plus votes and an overall positive majority on votes cast.

[3] IBM's Java-oriented WebSphere product is also based on Apache.

[4] To my knowledge, you cannot unload an Apache DSO after runtime.

[5] There's a vast variety of modules provided in the stock Apache 1.3.x package, including mod_speling (yes, spelled that way), which detects and corrects common misspellings in URLs, and mod_mime_magic, which autodetects documents' correct MIME types using Unix "magic numbers" and other hints from the filesystem (instead of just filename extensions0. Also mod_rewrite, which has unlimited ability to rewrite URLs in accordance with regular expressions. Also mod_proxy, providing much of the Squid Web cache's feature set, right inside Apache.

[6] On Red Hat, this would be in /etc/httpd/conf/ . On Debian, it would be in /etc/apache/ . You will also find some mentions of srm.conf and access.conf: What is now solely in Apache's httpd.conf file used to be split among all three files, for backwards compatibility with NCSA httpd. Recent Apache versions finally abandon that unnecessary complexity.

The same directory will also contain Apache's mime.types file, technically also part of Apache's configuration set.