Apache

From HerzbubeWiki

Jump to: navigation, search

This page contains information about configuring the HTTP server software Apache.

There currently are two major versions of this software: a legacy 1.3 version and the current version 2. When I started using Apache I did so with the 1.3 version; at some point I was forced to switch over to version 2. Parts of this document may therefore still contain references to the 1.3 version, or describe a feature that works only with the 1.3 version.


Contents

References

Local/Internet documentation

http://pelargir.herzbube.ch/doc/apache2-doc/manual/
http://httpd.apache.org/docs/

Details about how Debian organizes Apache2 configuration:

/etc/apache2/README


System installation

Debian packages

apache2
apache2-common
apache2-mpm-worker

The packages above automatically install a large number of Apache modules which in earlier days needed to be installed separately. A notable example is the SSL module. Other modules still need to be installed as a separate package (e.g. PHP, svn).


Security

On Debian the server runs as user www-data, group www-data.


Configuration

Overview

The directory that contains all the configuration files is

/etc/apache2

The main file that includes everything else is apache2.conf. It includes other files in the following order:

# Include module configuration:
Include /etc/apache2/mods-enabled/*.load
Include /etc/apache2/mods-enabled/*.conf

# Include all the user configurations:
Include /etc/apache2/httpd.conf

# Include ports listing
Include /etc/apache2/ports.conf

# Include generic snippets of statements
Include /etc/apache2/conf.d/

[...]

# Include the virtual host configurations:
Include /etc/apache2/sites-enabled/


I make use of this scheme in the following way:

  • Directory-based access rights and other general purpose configuration that needs to be loaded early on:
/etc/apache2/conf.d/pelargir.conf
  • SSL configuration
/etc/apache2/conf.d/pelargir-ssl.conf
  • LDAP configuration
/etc/apache2/conf.d/pelargir-ldap.conf
  • Virtual host configuration files in /etc/apache2/sites-available


/etc/apache2/conf.d/pelargir.conf

Access rights

My policy can be roughly summarized like this:

  • By default disable and deny everything
  • Enable and allow only those things that are explicitly needed

The following config snippet shows how I restrict the default access rights and turn off CGI execution and the PHP engine everywhere:

# ============================================================
# Configure the "default" to be a very restrictive set of
# permissions.
# ============================================================
<Directory />
  # This has 2 effects:
  # * any Allow directives are evaluated before the Deny directives. 
  # * in the absence of any explicit Allow/Deny directive, access
  #   will be denied
  Order allow,deny
  # In theory the above should already be sufficient, but hey, let's
  # be paranoid...
  Deny from all
  # Do not allow .htaccess files to override any of our settings
  AllowOverride None
  # Disable CGIs everywhere
  Options -ExecCGI
</Directory>

# ============================================================
# PHP configuration
# - by default we turn PHP execution off
# - we want to turn it back on explicitly for all packages
#   that we install
# ============================================================
<IfModule mod_php5.c>
  php_admin_flag engine off
  php_flag register_globals off
</IfModule>

An example how to explicitly allow access and enable CGI execution and the PHP engine for a specific directory:

<Directory /var/www/pelargir/>
  Allow from all
  Options +ExecCGI
  <IfModule mod_php5.c>
    php_admin_flag engine on
  </IfModule>
</Directory>


SSI configuration

Server side includes (SSI) are disabled by default. I turn them on for .shtml files like this:

# ============================================================
# SSI configuration
# ============================================================
<IfModule mod_mime.c>
  AddType text/html .shtml
  AddHandler server-parsed .shtml
  DirectoryIndex index.shtml
</IfModule>

In addition, any directory that wants to allow SSI needs to say something like this:

<Directory /var/www/pelargir>
  Options Includes
</Directory>

Note: An alternative is IncludesNOEXEC, which means that SSI are permitted, but the #exec cmd and #exec cgi are disabled.


CGI scripts in /usr/lib/cgi-bin

Various Debian packages (e.g. awstats, mailman) operate as CGI scripts in the generic directory /usr/lib/cgi-bin. Apache must be configured so that it executes these scripts correctly. I choose to do this in the following way (as usual with options that are not site-specific I place them in /etc/apache2/conf.d/pelargir.conf):

# ============================================================
# CGI configuration
# - /usr/lib/cgi-bin should be available as /cgi-bin for all
#   Virtual Servers
# - however, we use the normal Alias instead of ScriptAlias
#   because we don't want every file in the directory to
#   automatically become a CGI
# - instead we add a CGI handler for some file extensions
#   and allow execution of CGIs in /cgi-bin with
#   "Options +ExecCGI"
# - some larger packages may require .htaccess files for
#   restricting access to parts of the package
# ============================================================
<IfModule mod_cgi.c>
  Alias /cgi-bin/ /usr/lib/cgi-bin/

  <Directory /usr/lib/cgi-bin/>
    Allow from all
    Options +ExecCGI
    AllowOverride All
    AddHandler cgi-script .cgi .sh .pl
  </Directory>
</IfModule> 


Documentation

The documentation in /usr/share/doc should be available via HTTP:

# ============================================================
# Documentation should be available for all Virtual Servers
# ============================================================
Alias /doc/ /usr/share/doc/
<Directory /usr/share/doc/>
  Allow from all
  Options FollowSymLinks Includes Indexes
</Directory>

# Fix a too restrictive set of options in the apache2-doc package
<Directory "/usr/share/doc/apache2-doc/manual/">
  Allow from 192.168.0.0/255.255.0.0
</Directory>


Options directive

The Options directive controls which server features are available in a particular directory. Normally, if multiple Options could apply to a directory, then the most specific one is used and others are ignored; the options are not merged. However if all the options on the Options directive are preceded by a "+" or "-" symbol, the options are merged.

  • ExecCGI is useful, but should be set only in rare cases, where it is really necessary
  • FollowSymLinks is probably the most useful setting
  • Includes is useful only where SSI are required
  • Indexes exposes too much of the system and should be applied only with care
  • MultiViews is arcane (to me); it is useful if the same resource is present in more than one way (e.g. multiple languages) and the client should be able to say which type of resource it understands/prefers


.htaccess files

While processing a request, Apache looks for the first existing .htaccess file in every directory of the path to the document, if distributed configuration files are enabled for that directory. It is possible to completely disable .htaccess files by saying

AllowOverride None

for a given directory. It is also possible to allow only certain directives within .htaccess files - in this case AllowOverride simply lists the allowed directives.


Note: .htaccess is only a default file name; if you wish you can define additional or completely different file names through the AccessFileName directive.


Evaluation order

Directories are evaluated in the following order:

  • <Directory> and .htaccess files simultaneously, but .htaccess files can override settings on the <Directory> directive
  • Directives <DirectoryMatch> and <Directory> with RegExps
  • Directives <Files> and <FilesMatch> simultaneously
  • Directives <Location> and <LocationMatch> simultaneously
  • Order of <Directory> directives: first the shorter directories (e.g. / before /doc); if directories are the same, then the order in which statements appear in the configuration
  • all other directives: the order in which they appear in the configuration
  • VirtualHosts are evaluated after the regular configuration

Generally one can say: "last wins".


Default character set / encoding

The following config snippet allows to set a default character set / encoding which is applied by Apache to resources that have the content types text/plain or text/html:

<IfModule mod_mime.c>
  <Directory />
    AddDefaultCharset utf-8
  </Directory>
</IfModule>


CGI scripts

Options +ExecCGI says that - in addition to any other options - execution of CGI scripts is allowed in a certain directory. Nothing works without this directive!!!

Once CGI execution has been allowed via ExecCGI, there are various methods how to mark files as being CGI scripts:

  • ScriptAlias works exactly the same as Alias, but it additionally "blesses" the directory so that all (!) files inside the directory become CGIs.
  • AddHandler uses file extensions to mark files as CGI scripts. Example:
AddHandler cgi-script cgi pl
  • SetHandler marks all matching files as CGI scripts. This is roughly equivalent to ScriptAlias, but SetHandler is probably more flexible because it is usable in more contexts than ScriptAlias.


mod_rewrite

References

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html
http://httpd.apache.org/docs/2.2/misc/rewriteguide.html


Overview

In order to use the module, the following directive is necessary

RewriteEngine on

The engine is turned on/off per directory, or per virtual host.

The most commonly used directive is RewriteRule. It takes 2 arguments:

  • a regular expression that is used to match an address
  • a result to which the matching address should be rewritten

Example:

RewriteRule ^/$ /index.html

Usually a RewriteRule must refer to an existing path, i.e. it cannot refer to an alias. For instance, the following does not work:

Alias /xyz/ /abc/def/
RewriteRule ^/$ /xyz/index.html

The reason for this behaviour is explained at the end of the mod_rewrite documentation:

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#RewriteBase

Note: The directive RewriteBase can only be used inside a .htaccess file or within a <Directory> directive.


Redirects

Currently I know of 3 ways how to perform a redirect:

  • inside an .html document
<META HTTP-EQUIV="refresh" CONTENT="0; URL=http://foo.bar/service">
  • with mod_alias
 Redirect /service http://foo.bar/service
  • with mod_rewrite
RewriteRule ^/$ /phpws/index.php [R,L]

Some explanations:

  • the flag R means "Redirect"; the server will send the HTTP status code 302 ("moved temporarily"); in order to send a different status code, one needs to specify R=<statuscode>
  • the flag L means "Last", i.e. no further RewriteRules are applied after the current one


Query strings

Query strings are the part at the end of an URL that starts with a question mark character ("?"). Apache processes query strings in a special way, they are not part of the URL! For this reason, query strings cannot be matched by a RewriteRule!

Apache stores query strings in the variable QUERY_STRING. In order to test for the presence of a query string, one has to use a RewriteCond directive that processes the variable QUERY_STRING. For instance:

RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule <...>

Note: The "?" character is not part of the QUERY_STRING variable.

After a RewriteRule has performed its substitution task, the original query string is appended to the result. For instance:

RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foobar/ [R,L]

This replaces everything (".*") by "/foobar/" and appends the query string, thus producing the URL /foobar/?q=redirect_url.

If the substitution part of a RewriteRule already contains a new query string, this new query string replaces the original query string. For instance:

RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foo/?q=bar [R,L]

This rule produces the URL /foo/?q=bar.

In order to "combine" the old with the new query string the flag QSA can be used:

RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foo/?q=bar [R,L,QSA]

This rule produces the URL /foo/?q=bar&q=redirect_url=. The quey strings are joined by an ampersand ("&") character.

To entirely remove a query string, the last charachter in the substitution string of a RewriteRule must be a "?" character. For instance:

RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foobar/? [R,L]

This rule produces the URL /foobar/.

To re-use only parts of a query string you have to use backreferences:

RewriteCond %{QUERY_STRING} ^q=redirect_url=(.*)
RewriteRule .* %1? [R,L]

This rule replaces the old URL by a new one. The new URL consists of that part in the old query string that lies behind "redirect_url=". %1 is the backreference to the (.*) part in the RewriteCond directive.


Virtual Hosts

Virtual Host (or vhost) configuration is available in a separate page on this wiki: ApacheVirtualHost.


SSL

Overview

In order to allow https connections, Apache needs to listen on a different port than the default HTTP port 80. The default https port is 443. Apache also needs a server certificate that it can hand out to clients (e.g. web browsers) and that these clients can use to a) verify whether the server actually is who he claims, and b) to actually encrypt communication. The page ServiceEncryptionWithSSL contains instructions how to create and sign server certificates.

The most important directives are

  • SSLEngine on turns on SSL for a specific virtual host
  • SSLCertificateFile and SSLCertificateKeyFile are used to specify the location of the certificate file and the RSA key file
  • SSLCertificateChainFile is used to specify the location of a file that contains a chain of parent certificates, up to the root certificate


SSL and name-based virtual hosts

SSL and name-based virtual hosts are on somewhat "unfriendly terms", which means that normally it is not possible to define multiple name-based hosts all on the same port 443. A good explanation of the problem can be found at http://httpd.apache.org/docs-2.0/ssl/ssl_faq.html, although it should be obvious why the problem exists if you know

  1. About protocol layers
  2. That the HTTP layer comes on top of the SSL layer, and
  3. That name-based virtual hosts are based on the HTTP Host: header


The consequence is that normally you can have only one SSL enabled virtual host on any given combination of "IP address/TCP port". Instead of "normally", I should probably say "traditionally", because these days a solution to the problem exists: A TLS extension named SNI (Server Name Indication). Wikipedia has good information about the subject. The gist is that a client that uses the extension will tell Apache which server name it connects to as part of the TLS negotiation phase, i.e. before the SSL protocol layer is established. This allows Apache to choose the correct name-based virtual host, whose configuration will indicate the correct SSL certificate to serve.


/etc/apache2/conf.d/pelargir-ssl.conf

The file named in the section heading contains global SSL configuration options that are the same for all SSL-enabled virtual hosts.

<IfModule mod_ssl.c>
# Disabled because port 443 is already enabled in /etc/apache2/ports.conf
#  Listen 443
  Listen 192.168.0.2:901

  # Some MIME-types for downloading Certificates and CRLs
  AddType application/x-x509-ca-cert .crt
  AddType application/x-pkcs7-crl    .crl

  # Semaphore:
  #   Configure the path to the mutual exclusion semaphore the
  #   SSL engine uses internally for inter-process synchronization.
  SSLMutex file:/var/run/mod_ssl_mutex

  # Inter-Process Session Cache:
  #   Configure the SSL Session Cache: First either `none'
  #   or `dbm:/path/to/file' for the mechanism to use and
  #   second the expiring timeout (in seconds).
  SSLSessionCache         dbm:/var/run/mod_ssl_scache
  SSLSessionCacheTimeout  300
  #SSLSessionCache         none

  # Pseudo Random Number Generator (PRNG):
  #   Configure one or more sources to seed the PRNG of the
  #   SSL library. The seed data should be of good random quality.
  SSLRandomSeed startup file:/dev/urandom 512
  SSLRandomSeed connect file:/dev/urandom 512

  # Logging:
  #   The home of the dedicated SSL protocol logfile. Errors are
  #   additionally duplicated in the general error log file.  Put
  #   this somewhere where it cannot be used for symlink attacks on
  #   a real server (i.e. somewhere where only root can write).
  #   Log levels are (ascending order: higher ones include lower ones):
  #   none, error, warn, info, trace, debug.
  #SSLLog /var/log/apache2/ssl_engine.log
  #SSLLogLevel info
</IfModule>


Authentication / Authorization

References

Overview 
http://httpd.apache.org/docs/2.2/howto/auth.html
mod_auth_basic 
http://httpd.apache.org/docs/2.2/mod/mod_auth_basic.html
mod_auth_digest 
http://httpd.apache.org/docs/2.2/mod/mod_auth_digest.html
Require directive 
http://httpd.apache.org/docs/2.2/mod/core.html#require
mod_authnz_ldap 
http://httpd.apache.org/docs/2.2/mod/mod_authnz_ldap.html


Overview

Sometimes a resource made available by the web server needs to be protected from public access. These are the configuration steps necessary to do this:

  • generate a password file (typically named .htpasswd) that contains a list of users that are allowed to access the resource, and the users' passwords
  • configure Apache
    • tell Apache that authentication is required to access the resource
    • tell Apache where the password file is
    • tell Apache which users in the password file are authorized to access the resource
  • this configuration is done through directives that are located
    • either inside the main server configuration files
    • or inside an access file (typically named .htaccess); in this case the directory where the access file is located needs to be configured with "AllowOverride AuthConfig"


When a web client tries to access a protected resource the following happens:

  • the web server sends a challenge to the web client to authenticate itself
  • if the web client is a web browser, it will usually ask the user for a username and a password
  • the username/password entered is sent back to the web server
    • if simple authentication is used, the password will be sent in clear text
    • if digest authentication is used, the password will be sent as an MD5 hash
    • if the connection between web client and web server is not encrypted, the clear text password or the password hash are visible to anyone who can view network traffic between server and client
  • the web server looks up the entry in the password file that matches the username submitted by the web client
  • if there is no entry, authentication fails and access is denied
  • if there is an entry, the web server compares the entry's password to the clear text password or password hash submitted by the web client
  • if the passwords do not match, authentication fails and access is denied
  • if the passwords match, authentication succeeds
  • as the final step, the web server checks whether the user that just authenticated itself is authorized to access the resource
  • if the user is not authorized, access is denied
  • if the user is authorized, access is granted


Framework used by Apache2

Each authentication / authorization use case is governed by the following framework

  • the configuration must specify which authentication type is to be used
    • basic authentication = mod_auth_basic module
    • digest authentication = mod_auth_digest module
  • both modules are considered front-ends that delegate actual authentication and authorization to one of several possible authentication providers and authorization providers
  • authentication providers
    • mod_authn_file module: if user names and passwords are located in a password file
    • mod_authnz_ldap module: if user names and passwords are located in an LDAP directory
  • authorization providers
    • if authentication provider was mod_authn_file, there usually is no authorization provider because usually authorization is set to succeed automatically if authentication succeeds
    • mod_authnz_ldap module: can be used to fine-tune authorization by looking up rules located in an LDAP directory


mod_authn_file

TBD


LDAP

Why not PAM?

There exists a Debian package libapache2-mod-auth-pam which promises authentication through PAM. The package is based on what can be found at

http://pam.sourceforge.net/mod_auth_pam/

The web site says that the module is no longer maintained, therefore I prefer using the well-maintained module mod_authnz_ldap.


mod_authnz_ldap

Note: mod_authnz_ldap does not support digest authentication!

mod_authnz_ldap does its job in two phases:

  • authentication phase (aka search/bind phase); in this phase mod_authnz_ldap does the following things:
    • connect to the LDAP directory using a special "search" DN
    • search for an entry that matches the username submitted by the web client
    • if a single unique match is found, try to bind to the LDAP server using the entry and the password submitted by the web client
    • if binding is successful, authentication is considered to be successful
  • authorization phase (aka compare phase); in this phase mod_authnz_ldap does the following things:
    • what happens in this phase depends on whether a Require directive is present in the server configuration or .htaccess file, and what the directive specifies
    • if no Require directive exists, authorization succeeds automatically without authentication
    • Require valid-user = authorization succeeds automatically if authentication was successful; for this to work, the mod_authz_user module must be loaded and the AuthzLDAPAuthoritative directive must be set to off
    • Require ldap-user|ldap-group|ldap-dn|ldap-attribute|ldap-filter = authorization succeeds if a search of the LDAP directory using the specified user, group, DN, attribute or general LDAP filter produces the proper results
    • consult the docs for mod_authnz_ldap for details


LDAP connection information should be stored only once, especially because the information contains a password that needs to be protected. To achieve this, I store the information in a separate file that I include wherever I need it:

pelargir:/etc/apache2# l pelargir-ldap.conf 
-r-------- 1 root root 259 Oct  1 21:43 pelargir-ldap.conf

pelargir:/etc/apache2# cat pelargir-ldap.conf 
# ============================================================
# One-time definition of LDAP connection and other information.
# This information should be included whenever LDAP
# authentication and authorization is used.
# ============================================================
<IfModule mod_authnz_ldap.c>
  AuthLDAPUrl ldap://127.0.0.1:389/ou=users,dc=herzbube,dc=ch?uid?sub?(objectClass=*)
  AuthLDAPBindDN cn=readonly-users,ou=users,dc=herzbube,dc=ch
  AuthLDAPBindPassword secret
  # Tell mod_authnz_ldap to use attribute "memberUid" to check
  # for group membership
  AuthLDAPGroupAttribute memberUid
  # Tell mod_authnz_ldap that the group's memberUid attribute
  # contains simple user names (e.g. "foo"), not the whole user DN
  # (e.g. "cn=foo,ou=users,dc=herzbube,dc=ch")
  AuthLDAPGroupAttributeIsDN off
</IfModule>


To protect a directory, use the following configuration either in the main server configuration, or in a .htaccess file:

<Directory /foo/>
  Include pelargir-ldap.conf
  AuthName "bar"
  AuthType Basic
  AuthBasicProvider ldap
  Require valid-user
  AuthzLDAPAuthoritative off
</Directory>

Note: It seems to be impossible to specify the LDAP connection information for <Directory /> and to let other directories inherit it. Issue 45946 at issues.apache.org describes the problem. For the moment, I therefore have chosen the workaround to use Apache's Include directive to prevent duplication of connection information.


mod_ldap

mod_ldap provides the actual LDAP support required by mod_authnz_ldap as well as a caching mechanism to improve web server performance. Basically the module's defaults are OK, but I like to add an URL that allows inspection of the cache. The following configuration block in pelargir.conf achieves this:

# ============================================================
# LDAP configuration
# - the default values for mod_ldap are OK (1024 cache entries, TTL
#   for entries = 600 seconds)
# - add a cache information URL so that the cache status can be
#   inspected
#
# Note 1: During testing it might be useful to reduce the longevity
# of cache entries by saying "LDAPCacheTTL 5" (entries expire
# after 5 seconds) or "LDAPCacheEntries 0" (cache size is zero).
#
# Note 2: The file name to use with IfModule to check for the presence
# of mod_ldap is "util_ldap.c" (not "mod_ldap.c" as one would expect).
# Alternatively, the module identifier that can also be used with
# IfModule since Apache 2.1 is "ldap_module".
# ============================================================
<IfModule ldap_module>
  <Location /ldapcache-info>
    SetHandler ldap-status
    Allow from all
  </Location>
</IfModule>


PHP

Debian packages

php5
libapache2-mod-php5

References

http://wiki.debian.org/PHP
/usr/share/doc/php5/README.Debian.gz


PHP modules

PHP 4PHP 5Reason (depends, recommends, suggests)
php4-cgiphp5-cgigallery2, phpmyadmin, squirrelmail, websvn
php4-cliphp5-cliother PHP modules
php4-gdphp5-gdgallery2, phpmyadmin
php4-ldapphp5-ldapsquirrelmail, phpldapadmin
php4-mcryptphp5-mcryptphpmyadmin
php4-mysqlphp5-mysqlgallery2, phpmyadmin
php4-pearphp5-pearsquirrelmail


Note: After a Debian package for a PHP module has been installed, it may be necessary to reload/restart Apache to make the module available.


php5-gd

Installing the php5-gd package leads to a series of other Debian packages being installed. The biggest surprise here is that some of these packages are X11 packages:

  • php5-gd depends on libgd2-xpm (replacing libgd2-noxpm)
  • libgd2-xpm depends on libxpm4 and libx11-6
  • libxpm4 depends on x11-common
  • etc.


php.ini

PHP exists in 3 "versions":

  • the command line (CLI) version
  • the CGI version
  • the Apache module version


Depending on the version being executed, PHP locates its configuration in different directories:

/etc/php5/cli
/etc/php5/cgi
/etc/php5/apache2

Within each directory, PHP looks for a central configuration file php.ini. In addition, PHP scans the sub-directory conf.d (if it exists) and treats any file found in the sub-directory ending in .ini as a configuration file. By default, in all versions of PHP conf.d is a symlink that points to a central directory that contains shared configuration files. The central directory is located here:

/etc/php5/conf.d

This makes it easy to modify the PHP configuration: Simply drop a new file into the conf.d directory instead of modifying the central configuration file php.ini. For instance, Debian packages for PHP modules drop a module-specific configuration file into conf.d. Each module's configuration file then at least "activates" the module through an "extension=" statement. For instance:

pelargir:/etc/php5/conf.d# cat gd.ini 
# configuration for php GD module
extension=gd.so


Web browser problems

Sometimes you have enabled PHP for a directory, and everything seems to be configured just right, but the web browser still tries to download the .php file instead of letting the web server "execute" it. If this happens, it might help to a) click the browser's "reload" button, or b) to clear the browser's cache.


Upgrading to Apache 2

References

http://www.linux.com/article.pl?sid=05/09/01/186204
http://httpd.apache.org/docs/2.0/upgrading.html

Personal notes

The upgrade became necessary when I wanted to use the Subversion module for Apache, for which there only exists an Apache 2 module. The Debian package for the module is libapache2-svn. Installation of the module package forced the following dependencies to be installed:

apache2
apache2-common
apache2-mpm-worker

Apache 1.3 was still installed and running on port 80. This caused Apache 2 to become installed in a de-activated state. It was also not possible to manually start the Apache 2 daemon because of this conflict. In order to be able to test the new configuration while the old daemon was still running and serving requests from the Internet, I changed the port on which Apache 2 was listening to 8080. I did this by modifying

/etc/apache2/ports.conf

I was then able to gradually change and test the Apache 2 configuration until it was up to the state I had with Apache 1.3. This involved the following steps:

  • enable various modules with the command line utility a2enmod; I took some hints about which modules need to be enabled from the module configuration of Apache 1.3 in /etc/apache2/modules.conf
  • create sub-directories in /var/log/apache2 for the log files of the various virtual hosts
  • modify the various site configuration files in /etc/awstats so that the LogFile option points to the new Apache 2 log directories
  • make sure that already installed packages that are based on Apache have their configuration updated for Apache 2
    • sometimes this was as simple as saying dpkg-reconfigure <package>; the DebConf process then detected that Apache 2 was available and prompted me for the version of Apache that I wanted to install a configuration for
    • sometimes I had to manually create a symlink, such as
ln -s /etc/gallery/apache.conf /etc/apache2/conf.d/gallery.conf
  • install the Debian package libapache2-mod-php4; this caused apache2-mpm-worker to be replaced by apache2-mpm-prefork
  • modify /etc/php4/apache2/php.ini so that it contains the following lines
extension=ldap.so
extension=mysql.so
extension=domxml.so
extension=gd.so
extension=mcrypt.so
extension=imap.so

When I was finished with the upgrade process, I reset the port in /etc/apache2/ports.conf to 80, changed the file /etc/default/apache2 to contain the string NO_START=0 and uninstalled the old Apache 1.3 package.


Troubleshooting

Address already in use

The following error message on startup

(98)Address already in use: make_sock: could not bind to address [::]:443

means that the daemon tries to bind to the same address/port several times. I encountered this problem during an upgrade where the "Listen 443" directive appeared twice, once in /etc/apache2/ports.conf and once in /etc/apache2/conf.d/pelargir-ssl.conf


NameVirtualHost *:80 has no VirtualHosts

When the message "NameVirtualHost *:80 has no VirtualHosts" appears on startup, there may be two up to two causes for the problem:

  • the directive "NameVirtualHost *:80" appears multiple times in the configuration
  • the VirtualHost "*:80" does not appear anywhere in the configuration; a VirtualHost "*", i.e. without port number, is not sufficient


Log file analyzer

Overview

There are several log file analyzers out there. Three popular ones of which I have heard and that are also available from Debian are:

  • awstats
  • analog
  • webalizer

An interesting comparison table can be found at

http://awstats.sourceforge.net/docs/awstats_compare.html

After taking a short look at each package I discovered that both analog and webalizer have not seen any developer activities for quite some time now. The last analog release happened in December 2004, the last webalizer release in April 2002.

One could argue that log file formats have been standardized a long time ago and that these packages are stable and do not need any further development. And yet - for some strange reason, if I have to choose between old and supposedly stable software, and newer, more actively developed and supported software, I tend to choose the newer alternative.

In short, I use awstats, whose latest release (as of writing this section) has happened in January 2007.


Debian package

The Debian package to install is

awstats


Configuration

The following configuration steps use my model domain herzbube.ch. Other sites/domains can be configured in the same way.

Copy the configuration example:

cp /usr/share/doc/awstats/examples/awstats.model.conf.gz /etc/awstats/awstats.www.herzbube.ch.conf.gz
gunzip /etc/awstats/awstats.www.herzbube.ch.conf.gz


Modify the awstats.www.herzbube.ch.conf file:

LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/herzbube.ch/access.log* |"
SiteDomain="www.herzbube.ch"
HostAliases="REGEX[.+\.herzbube.ch] herzbube.ch 127.0.0.1 localhost"
AllowToUpdateStatsFromBrowser=0
AllowFullYearView=3

Notes:

  • AllowToUpdateStatsFromBrowser is disabled because statistics updates are triggered automatically by cron


Next we need to modify the Apache configuration to contain the following (I place this in /etc/apache2/conf.d/pelargir.conf):

<Directory /usr/share/awstats>
  Options None
  AllowOverride None
  Order allow,deny
  Allow from all
</Directory>
<Directory /usr/share/doc/awstats/examples>
  Options None
  AllowOverride None
  Order allow,deny
  Allow from all
</Directory>
Alias /awstatscss/ /usr/share/doc/awstats/examples/css/
Alias /icon/ /usr/share/awstats/icon/

Note: The actual awstats.pl script is located in /usr/lib/cgi-bin and must be executed by the web server from there. If you have not yet configured the directory for CGI execution, you must do so now. See further up for a section that describes how to do this.


Statistics update

The awstats package includes the following cron file

/etc/cron.d/awstats

which contains an entry that triggers the awstats statistics update process for all sites that have a configuration file with a name that matches the following pattern:

/etc/awstats/awstats.*conf


By default the statistics update may also be triggered manually by the user when he accesses awstats.pl via web browser. However, I have disabled this by setting AllowToUpdateStatsFromBrowser to 0 in /etc/awstats/awstats.conf.


Permissions

The Debian package for Apache installs a configuration file for the logrotate package: The default file content at the time of writing looks like this:

pelargir:~# cat /etc/logrotate.d/apache2 
/var/log/apache2/*.log {
        weekly
        missingok
        rotate 52
        compress
        delaycompress
        notifempty
        create 640 root adm
        sharedscripts
        postrotate
                /etc/init.d/apache2 reload > /dev/null
        endscript
}


The above logrotate configuration contains the statement

create 640 root adm

This statement makes sure that after rotation the log file is re-created with restricted permissions, with the intent to protect the log file (which might contain sensitive information) from prying eyes. Unfortunately, those restricted permissions also prevent the awstats statistics update script from accessing the rotated log files, because the update script always runs as the user www-data:

  • When the user manually triggers a statistics update via browser, the update script runs using the permissions of the web server
  • When cron triggers the update, the cron config file /etc/cron.d/awstats specifies to also run the update as user www-data


This problem can be solved in several ways

  • Add user www-data to group adm. The drawback of this method is that www-data can now access every file that belongs to that group, which might be more than is intended
  • Change the logrotate permissions to 644. This also requires that the permissions for subfolders of /var/log/apache2 are changed to 644 so that the awstats update script can peer into those folders. The drawback of this method is that now everybody who has access to the system can read the web server logs. Also the changes might be reverted by a future update of the apache package.
  • Change the awstats cron script so that cron runs the update script as root. The drawback of this method is that the update script now has write access to the entire system. Also the change might be reverted by a future update of the awstats package.


For the moment, I have decided to add www-data to the adm group, because this seems to be the change with the least impact.

Personal tools
francesca