Apache
From HerzbubeWiki
This page contains information about configuring the HTTP server software Apache.
There currently are two major versions of this software: a legacy 1.3 version and the current version 2. When I started using Apache I did so with the 1.3 version; at some point I was forced to switch over to version 2. Parts of this document may therefore still contain references to the 1.3 version, or describe a feature that works only with the 1.3 version.
Contents |
References
Local/Internet documentation
http://pelargir.herzbube.ch/doc/apache2-doc/manual/ http://httpd.apache.org/docs/
Details about how Debian organizes Apache2 configuration:
/etc/apache2/README
System installation
Debian packages
apache2 apache2-common apache2-mpm-worker
The packages above automatically install a large number of Apache modules which in earlier days needed to be installed separately. A notable example is the SSL module. Other modules still need to be installed as a separate package (e.g. PHP, svn).
Security
On Debian the server runs as user www-data, group www-data.
Configuration
Overview
The directory that contains all the configuration files is
/etc/apache2
The main file that includes everything else is apache2.conf. It includes other files in the following order:
# Include module configuration: Include /etc/apache2/mods-enabled/*.load Include /etc/apache2/mods-enabled/*.conf # Include all the user configurations: Include /etc/apache2/httpd.conf # Include ports listing Include /etc/apache2/ports.conf # Include generic snippets of statements Include /etc/apache2/conf.d/ [...] # Include the virtual host configurations: Include /etc/apache2/sites-enabled/
I make use of this scheme in the following way:
- Directory-based access rights and other general purpose configuration that needs to be loaded early on:
/etc/apache2/conf.d/pelargir.conf
- SSL configuration
/etc/apache2/conf.d/pelargir-ssl.conf
- LDAP configuration
/etc/apache2/conf.d/pelargir-ldap.conf
- Virtual host configuration files in /etc/apache2/sites-available
/etc/apache2/conf.d/pelargir.conf
Access rights
My policy can be roughly summarized like this:
- By default disable and deny everything
- Enable and allow only those things that are explicitly needed
The following config snippet shows how I restrict the default access rights and turn off CGI execution and the PHP engine everywhere:
# ============================================================ # Configure the "default" to be a very restrictive set of # permissions. # ============================================================ <Directory /> # This has 2 effects: # * any Allow directives are evaluated before the Deny directives. # * in the absence of any explicit Allow/Deny directive, access # will be denied Order allow,deny # In theory the above should already be sufficient, but hey, let's # be paranoid... Deny from all # Do not allow .htaccess files to override any of our settings AllowOverride None # Disable CGIs everywhere Options -ExecCGI </Directory> # ============================================================ # PHP configuration # - by default we turn PHP execution off # - we want to turn it back on explicitly for all packages # that we install # ============================================================ <IfModule mod_php5.c> php_admin_flag engine off php_flag register_globals off </IfModule>
An example how to explicitly allow access and enable CGI execution and the PHP engine for a specific directory:
<Directory /var/www/pelargir/>
Allow from all
Options +ExecCGI
<IfModule mod_php5.c>
php_admin_flag engine on
</IfModule>
</Directory>
SSI configuration
Server side includes (SSI) are disabled by default. I turn them on for .shtml files like this:
# ============================================================ # SSI configuration # ============================================================ <IfModule mod_mime.c> AddType text/html .shtml AddHandler server-parsed .shtml DirectoryIndex index.shtml </IfModule>
In addition, any directory that wants to allow SSI needs to say something like this:
<Directory /var/www/pelargir> Options Includes </Directory>
Note: An alternative is IncludesNOEXEC, which means that SSI are permitted, but the #exec cmd and #exec cgi are disabled.
CGI scripts in /usr/lib/cgi-bin
Various Debian packages (e.g. awstats, mailman) operate as CGI scripts in the generic directory /usr/lib/cgi-bin. Apache must be configured so that it executes these scripts correctly. I choose to do this in the following way (as usual with options that are not site-specific I place them in /etc/apache2/conf.d/pelargir.conf):
# ============================================================
# CGI configuration
# - /usr/lib/cgi-bin should be available as /cgi-bin for all
# Virtual Servers
# - however, we use the normal Alias instead of ScriptAlias
# because we don't want every file in the directory to
# automatically become a CGI
# - instead we add a CGI handler for some file extensions
# and allow execution of CGIs in /cgi-bin with
# "Options +ExecCGI"
# - some larger packages may require .htaccess files for
# restricting access to parts of the package
# ============================================================
<IfModule mod_cgi.c>
Alias /cgi-bin/ /usr/lib/cgi-bin/
<Directory /usr/lib/cgi-bin/>
Allow from all
Options +ExecCGI
AllowOverride All
AddHandler cgi-script .cgi .sh .pl
</Directory>
</IfModule>
Documentation
The documentation in /usr/share/doc should be available via HTTP:
# ============================================================ # Documentation should be available for all Virtual Servers # ============================================================ Alias /doc/ /usr/share/doc/ <Directory /usr/share/doc/> Allow from all Options FollowSymLinks Includes Indexes </Directory> # Fix a too restrictive set of options in the apache2-doc package <Directory "/usr/share/doc/apache2-doc/manual/"> Allow from 192.168.0.0/255.255.0.0 </Directory>
Options directive
The Options directive controls which server features are available in a particular directory. Normally, if multiple Options could apply to a directory, then the most specific one is used and others are ignored; the options are not merged. However if all the options on the Options directive are preceded by a "+" or "-" symbol, the options are merged.
- ExecCGI is useful, but should be set only in rare cases, where it is really necessary
- FollowSymLinks is probably the most useful setting
- Includes is useful only where SSI are required
- Indexes exposes too much of the system and should be applied only with care
- MultiViews is arcane (to me); it is useful if the same resource is present in more than one way (e.g. multiple languages) and the client should be able to say which type of resource it understands/prefers
.htaccess files
While processing a request, Apache looks for the first existing .htaccess file in every directory of the path to the document, if distributed configuration files are enabled for that directory. It is possible to completely disable .htaccess files by saying
AllowOverride None
for a given directory. It is also possible to allow only certain directives within .htaccess files - in this case AllowOverride simply lists the allowed directives.
Note: .htaccess is only a default file name; if you wish you can define additional or completely different file names through the AccessFileName directive.
Evaluation order
Directories are evaluated in the following order:
- <Directory> and .htaccess files simultaneously, but .htaccess files can override settings on the <Directory> directive
- Directives <DirectoryMatch> and <Directory> with RegExps
- Directives <Files> and <FilesMatch> simultaneously
- Directives <Location> and <LocationMatch> simultaneously
- Order of <Directory> directives: first the shorter directories (e.g. / before /doc); if directories are the same, then the order in which statements appear in the configuration
- all other directives: the order in which they appear in the configuration
- VirtualHosts are evaluated after the regular configuration
Generally one can say: "last wins".
Default character set / encoding
The following config snippet allows to set a default character set / encoding which is applied by Apache to resources that have the content types text/plain or text/html:
<IfModule mod_mime.c>
<Directory />
AddDefaultCharset utf-8
</Directory>
</IfModule>
CGI scripts
Options +ExecCGI says that - in addition to any other options - execution of CGI scripts is allowed in a certain directory. Nothing works without this directive!!!
Once CGI execution has been allowed via ExecCGI, there are various methods how to mark files as being CGI scripts:
-
ScriptAliasworks exactly the same asAlias, but it additionally "blesses" the directory so that all (!) files inside the directory become CGIs. -
AddHandleruses file extensions to mark files as CGI scripts. Example:
AddHandler cgi-script cgi pl
-
SetHandlermarks all matching files as CGI scripts. This is roughly equivalent toScriptAlias, butSetHandleris probably more flexible because it is usable in more contexts thanScriptAlias.
mod_rewrite
References
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html http://httpd.apache.org/docs/2.2/misc/rewriteguide.html
Overview
In order to use the module, the following directive is necessary
RewriteEngine on
The engine is turned on/off per directory, or per virtual host.
The most commonly used directive is RewriteRule. It takes 2 arguments:
- a regular expression that is used to match an address
- a result to which the matching address should be rewritten
Example:
RewriteRule ^/$ /index.html
Usually a RewriteRule must refer to an existing path, i.e. it cannot refer to an alias. For instance, the following does not work:
Alias /xyz/ /abc/def/ RewriteRule ^/$ /xyz/index.html
The reason for this behaviour is explained at the end of the mod_rewrite documentation:
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#RewriteBase
Note: The directive RewriteBase can only be used inside a .htaccess file or within a <Directory> directive.
Redirects
Currently I know of 3 ways how to perform a redirect:
- inside an .html document
<META HTTP-EQUIV="refresh" CONTENT="0; URL=http://foo.bar/service">
- with mod_alias
Redirect /service http://foo.bar/service
- with mod_rewrite
RewriteRule ^/$ /phpws/index.php [R,L]
Some explanations:
- the flag R means "Redirect"; the server will send the HTTP status code 302 ("moved temporarily"); in order to send a different status code, one needs to specify R=<statuscode>
- the flag L means "Last", i.e. no further RewriteRules are applied after the current one
Query strings
Query strings are the part at the end of an URL that starts with a question mark character ("?"). Apache processes query strings in a special way, they are not part of the URL! For this reason, query strings cannot be matched by a RewriteRule!
Apache stores query strings in the variable QUERY_STRING. In order to test for the presence of a query string, one has to use a RewriteCond directive that processes the variable QUERY_STRING. For instance:
RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule <...>
Note: The "?" character is not part of the QUERY_STRING variable.
After a RewriteRule has performed its substitution task, the original query string is appended to the result. For instance:
RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foobar/ [R,L]
This replaces everything (".*") by "/foobar/" and appends the query string, thus producing the URL /foobar/?q=redirect_url.
If the substitution part of a RewriteRule already contains a new query string, this new query string replaces the original query string. For instance:
RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foo/?q=bar [R,L]
This rule produces the URL /foo/?q=bar.
In order to "combine" the old with the new query string the flag QSA can be used:
RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foo/?q=bar [R,L,QSA]
This rule produces the URL /foo/?q=bar&q=redirect_url=. The quey strings are joined by an ampersand ("&") character.
To entirely remove a query string, the last charachter in the substitution string of a RewriteRule must be a "?" character. For instance:
RewriteCond %{QUERY_STRING} ^q=redirect_url=
RewriteRule .* /foobar/? [R,L]
This rule produces the URL /foobar/.
To re-use only parts of a query string you have to use backreferences:
RewriteCond %{QUERY_STRING} ^q=redirect_url=(.*)
RewriteRule .* %1? [R,L]
This rule replaces the old URL by a new one. The new URL consists of that part in the old query string that lies behind "redirect_url=". %1 is the backreference to the (.*) part in the RewriteCond directive.
Virtual Hosts
Virtual Host (or vhost) configuration is available in a separate page on this wiki: ApacheVirtualHost.
SSL
Overview
In order to allow https connections, Apache needs to listen on a different port than the default HTTP port 80. The default https port is 443. Apache also needs a server certificate that it can hand out to clients (e.g. web browsers) and that these clients can use to a) verify whether the server actually is who he claims, and b) to actually encrypt communication. The page ServiceEncryptionWithSSL contains instructions how to create and sign server certificates.
The most important directives are
- SSLEngine on turns on SSL for a specific virtual host
- SSLCertificateFile and SSLCertificateKeyFile are used to specify the location of the certificate file and the RSA key file
- SSLCertificateChainFile is used to specify the location of a file that contains a chain of parent certificates, up to the root certificate
SSL and name-based virtual hosts
SSL and name-based virtual hosts are on somewhat "unfriendly terms", which means that normally it is not possible to define multiple name-based hosts all on the same port 443. A good explanation of the problem can be found at http://httpd.apache.org/docs-2.0/ssl/ssl_faq.html, although it should be obvious why the problem exists if you know
- About protocol layers
- That the HTTP layer comes on top of the SSL layer, and
- That name-based virtual hosts are based on the HTTP Host: header
The consequence is that normally you can have only one SSL enabled virtual host on any given combination of "IP address/TCP port". Instead of "normally", I should probably say "traditionally", because these days a solution to the problem exists: A TLS extension named SNI (Server Name Indication). Wikipedia has good information about the subject. The gist is that a client that uses the extension will tell Apache which server name it connects to as part of the TLS negotiation phase, i.e. before the SSL protocol layer is established. This allows Apache to choose the correct name-based virtual host, whose configuration will indicate the correct SSL certificate to serve.
/etc/apache2/conf.d/pelargir-ssl.conf
The file named in the section heading contains global SSL configuration options that are the same for all SSL-enabled virtual hosts.
<IfModule mod_ssl.c> # Disabled because port 443 is already enabled in /etc/apache2/ports.conf # Listen 443 Listen 192.168.0.2:901 # Some MIME-types for downloading Certificates and CRLs AddType application/x-x509-ca-cert .crt AddType application/x-pkcs7-crl .crl # Semaphore: # Configure the path to the mutual exclusion semaphore the # SSL engine uses internally for inter-process synchronization. SSLMutex file:/var/run/mod_ssl_mutex # Inter-Process Session Cache: # Configure the SSL Session Cache: First either `none' # or `dbm:/path/to/file' for the mechanism to use and # second the expiring timeout (in seconds). SSLSessionCache dbm:/var/run/mod_ssl_scache SSLSessionCacheTimeout 300 #SSLSessionCache none # Pseudo Random Number Generator (PRNG): # Configure one or more sources to seed the PRNG of the # SSL library. The seed data should be of good random quality. SSLRandomSeed startup file:/dev/urandom 512 SSLRandomSeed connect file:/dev/urandom 512 # Logging: # The home of the dedicated SSL protocol logfile. Errors are # additionally duplicated in the general error log file. Put # this somewhere where it cannot be used for symlink attacks on # a real server (i.e. somewhere where only root can write). # Log levels are (ascending order: higher ones include lower ones): # none, error, warn, info, trace, debug. #SSLLog /var/log/apache2/ssl_engine.log #SSLLogLevel info </IfModule>
Authentication / Authorization
References
- Overview
- http://httpd.apache.org/docs/2.2/howto/auth.html
- mod_auth_basic
- http://httpd.apache.org/docs/2.2/mod/mod_auth_basic.html
- mod_auth_digest
- http://httpd.apache.org/docs/2.2/mod/mod_auth_digest.html
- Require directive
- http://httpd.apache.org/docs/2.2/mod/core.html#require
- mod_authnz_ldap
- http://httpd.apache.org/docs/2.2/mod/mod_authnz_ldap.html
Overview
Sometimes a resource made available by the web server needs to be protected from public access. These are the configuration steps necessary to do this:
- generate a password file (typically named .htpasswd) that contains a list of users that are allowed to access the resource, and the users' passwords
- configure Apache
- tell Apache that authentication is required to access the resource
- tell Apache where the password file is
- tell Apache which users in the password file are authorized to access the resource
- this configuration is done through directives that are located
- either inside the main server configuration files
- or inside an access file (typically named .htaccess); in this case the directory where the access file is located needs to be configured with "AllowOverride AuthConfig"
When a web client tries to access a protected resource the following happens:
- the web server sends a challenge to the web client to authenticate itself
- if the web client is a web browser, it will usually ask the user for a username and a password
- the username/password entered is sent back to the web server
- if simple authentication is used, the password will be sent in clear text
- if digest authentication is used, the password will be sent as an MD5 hash
- if the connection between web client and web server is not encrypted, the clear text password or the password hash are visible to anyone who can view network traffic between server and client
- the web server looks up the entry in the password file that matches the username submitted by the web client
- if there is no entry, authentication fails and access is denied
- if there is an entry, the web server compares the entry's password to the clear text password or password hash submitted by the web client
- if the passwords do not match, authentication fails and access is denied
- if the passwords match, authentication succeeds
- as the final step, the web server checks whether the user that just authenticated itself is authorized to access the resource
- if the user is not authorized, access is denied
- if the user is authorized, access is granted
Framework used by Apache2
Each authentication / authorization use case is governed by the following framework
- the configuration must specify which authentication type is to be used
- basic authentication = mod_auth_basic module
- digest authentication = mod_auth_digest module
- both modules are considered front-ends that delegate actual authentication and authorization to one of several possible authentication providers and authorization providers
- authentication providers
- mod_authn_file module: if user names and passwords are located in a password file
- mod_authnz_ldap module: if user names and passwords are located in an LDAP directory
- authorization providers
- if authentication provider was mod_authn_file, there usually is no authorization provider because usually authorization is set to succeed automatically if authentication succeeds
- mod_authnz_ldap module: can be used to fine-tune authorization by looking up rules located in an LDAP directory
mod_authn_file
TBD
LDAP
Why not PAM?
There exists a Debian package libapache2-mod-auth-pam which promises authentication through PAM. The package is based on what can be found at
http://pam.sourceforge.net/mod_auth_pam/
The web site says that the module is no longer maintained, therefore I prefer using the well-maintained module mod_authnz_ldap.
mod_authnz_ldap
Note: mod_authnz_ldap does not support digest authentication!
mod_authnz_ldap does its job in two phases:
- authentication phase (aka search/bind phase); in this phase mod_authnz_ldap does the following things:
- connect to the LDAP directory using a special "search" DN
- search for an entry that matches the username submitted by the web client
- if a single unique match is found, try to bind to the LDAP server using the entry and the password submitted by the web client
- if binding is successful, authentication is considered to be successful
- authorization phase (aka compare phase); in this phase mod_authnz_ldap does the following things:
- what happens in this phase depends on whether a Require directive is present in the server configuration or .htaccess file, and what the directive specifies
- if no Require directive exists, authorization succeeds automatically without authentication
- Require valid-user = authorization succeeds automatically if authentication was successful; for this to work, the mod_authz_user module must be loaded and the AuthzLDAPAuthoritative directive must be set to off
- Require ldap-user|ldap-group|ldap-dn|ldap-attribute|ldap-filter = authorization succeeds if a search of the LDAP directory using the specified user, group, DN, attribute or general LDAP filter produces the proper results
- consult the docs for mod_authnz_ldap for details
LDAP connection information should be stored only once, especially because the information contains a password that needs to be protected. To achieve this, I store the information in a separate file that I include wherever I need it:
pelargir:/etc/apache2# l pelargir-ldap.conf -r-------- 1 root root 259 Oct 1 21:43 pelargir-ldap.conf pelargir:/etc/apache2# cat pelargir-ldap.conf # ============================================================ # One-time definition of LDAP connection and other information. # This information should be included whenever LDAP # authentication and authorization is used. # ============================================================ <IfModule mod_authnz_ldap.c> AuthLDAPUrl ldap://127.0.0.1:389/ou=users,dc=herzbube,dc=ch?uid?sub?(objectClass=*) AuthLDAPBindDN cn=readonly-users,ou=users,dc=herzbube,dc=ch AuthLDAPBindPassword secret # Tell mod_authnz_ldap to use attribute "memberUid" to check # for group membership AuthLDAPGroupAttribute memberUid # Tell mod_authnz_ldap that the group's memberUid attribute # contains simple user names (e.g. "foo"), not the whole user DN # (e.g. "cn=foo,ou=users,dc=herzbube,dc=ch") AuthLDAPGroupAttributeIsDN off </IfModule>
To protect a directory, use the following configuration either in the main server configuration, or in a .htaccess file:
<Directory /foo/> Include pelargir-ldap.conf AuthName "bar" AuthType Basic AuthBasicProvider ldap Require valid-user AuthzLDAPAuthoritative off </Directory>
Note: It seems to be impossible to specify the LDAP connection information for <Directory /> and to let other directories inherit it. Issue 45946 at issues.apache.org describes the problem. For the moment, I therefore have chosen the workaround to use Apache's Include directive to prevent duplication of connection information.
mod_ldap
mod_ldap provides the actual LDAP support required by mod_authnz_ldap as well as a caching mechanism to improve web server performance. Basically the module's defaults are OK, but I like to add an URL that allows inspection of the cache. The following configuration block in pelargir.conf achieves this:
# ============================================================
# LDAP configuration
# - the default values for mod_ldap are OK (1024 cache entries, TTL
# for entries = 600 seconds)
# - add a cache information URL so that the cache status can be
# inspected
#
# Note 1: During testing it might be useful to reduce the longevity
# of cache entries by saying "LDAPCacheTTL 5" (entries expire
# after 5 seconds) or "LDAPCacheEntries 0" (cache size is zero).
#
# Note 2: The file name to use with IfModule to check for the presence
# of mod_ldap is "util_ldap.c" (not "mod_ldap.c" as one would expect).
# Alternatively, the module identifier that can also be used with
# IfModule since Apache 2.1 is "ldap_module".
# ============================================================
<IfModule ldap_module>
<Location /ldapcache-info>
SetHandler ldap-status
Allow from all
</Location>
</IfModule>
PHP
Debian packages
php5 libapache2-mod-php5
References
http://wiki.debian.org/PHP /usr/share/doc/php5/README.Debian.gz
PHP modules
| PHP 4 | PHP 5 | Reason (depends, recommends, suggests) |
|---|---|---|
| php4-cgi | php5-cgi | gallery2, phpmyadmin, squirrelmail, websvn |
| php4-cli | php5-cli | other PHP modules |
| php4-gd | php5-gd | gallery2, phpmyadmin |
| php4-ldap | php5-ldap | squirrelmail, phpldapadmin |
| php4-mcrypt | php5-mcrypt | phpmyadmin |
| php4-mysql | php5-mysql | gallery2, phpmyadmin |
| php4-pear | php5-pear | squirrelmail |
Note: After a Debian package for a PHP module has been installed, it may be necessary to reload/restart Apache to make the module available.
php5-gd
Installing the php5-gd package leads to a series of other Debian packages being installed. The biggest surprise here is that some of these packages are X11 packages:
- php5-gd depends on libgd2-xpm (replacing libgd2-noxpm)
- libgd2-xpm depends on libxpm4 and libx11-6
- libxpm4 depends on x11-common
- etc.
php.ini
PHP exists in 3 "versions":
- the command line (CLI) version
- the CGI version
- the Apache module version
Depending on the version being executed, PHP locates its configuration in different directories:
/etc/php5/cli /etc/php5/cgi /etc/php5/apache2
Within each directory, PHP looks for a central configuration file php.ini. In addition, PHP scans the sub-directory conf.d (if it exists) and treats any file found in the sub-directory ending in .ini as a configuration file. By default, in all versions of PHP conf.d is a symlink that points to a central directory that contains shared configuration files. The central directory is located here:
/etc/php5/conf.d
This makes it easy to modify the PHP configuration: Simply drop a new file into the conf.d directory instead of modifying the central configuration file php.ini. For instance, Debian packages for PHP modules drop a module-specific configuration file into conf.d. Each module's configuration file then at least "activates" the module through an "extension=" statement. For instance:
pelargir:/etc/php5/conf.d# cat gd.ini # configuration for php GD module extension=gd.so
Web browser problems
Sometimes you have enabled PHP for a directory, and everything seems to be configured just right, but the web browser still tries to download the .php file instead of letting the web server "execute" it. If this happens, it might help to a) click the browser's "reload" button, or b) to clear the browser's cache.
Upgrading to Apache 2
References
http://www.linux.com/article.pl?sid=05/09/01/186204 http://httpd.apache.org/docs/2.0/upgrading.html
Personal notes
The upgrade became necessary when I wanted to use the Subversion module for Apache, for which there only exists an Apache 2 module. The Debian package for the module is libapache2-svn. Installation of the module package forced the following dependencies to be installed:
apache2 apache2-common apache2-mpm-worker
Apache 1.3 was still installed and running on port 80. This caused Apache 2 to become installed in a de-activated state. It was also not possible to manually start the Apache 2 daemon because of this conflict. In order to be able to test the new configuration while the old daemon was still running and serving requests from the Internet, I changed the port on which Apache 2 was listening to 8080. I did this by modifying
/etc/apache2/ports.conf
I was then able to gradually change and test the Apache 2 configuration until it was up to the state I had with Apache 1.3. This involved the following steps:
- enable various modules with the command line utility a2enmod; I took some hints about which modules need to be enabled from the module configuration of Apache 1.3 in /etc/apache2/modules.conf
- create sub-directories in /var/log/apache2 for the log files of the various virtual hosts
- modify the various site configuration files in /etc/awstats so that the LogFile option points to the new Apache 2 log directories
- make sure that already installed packages that are based on Apache have their configuration updated for Apache 2
- sometimes this was as simple as saying dpkg-reconfigure <package>; the DebConf process then detected that Apache 2 was available and prompted me for the version of Apache that I wanted to install a configuration for
- sometimes I had to manually create a symlink, such as
ln -s /etc/gallery/apache.conf /etc/apache2/conf.d/gallery.conf
- install the Debian package libapache2-mod-php4; this caused apache2-mpm-worker to be replaced by apache2-mpm-prefork
- modify /etc/php4/apache2/php.ini so that it contains the following lines
extension=ldap.so extension=mysql.so extension=domxml.so extension=gd.so extension=mcrypt.so extension=imap.so
When I was finished with the upgrade process, I reset the port in /etc/apache2/ports.conf to 80, changed the file /etc/default/apache2 to contain the string NO_START=0 and uninstalled the old Apache 1.3 package.
Troubleshooting
Address already in use
The following error message on startup
(98)Address already in use: make_sock: could not bind to address [::]:443
means that the daemon tries to bind to the same address/port several times. I encountered this problem during an upgrade where the "Listen 443" directive appeared twice, once in /etc/apache2/ports.conf and once in /etc/apache2/conf.d/pelargir-ssl.conf
NameVirtualHost *:80 has no VirtualHosts
When the message "NameVirtualHost *:80 has no VirtualHosts" appears on startup, there may be two up to two causes for the problem:
- the directive "NameVirtualHost *:80" appears multiple times in the configuration
- the VirtualHost "*:80" does not appear anywhere in the configuration; a VirtualHost "*", i.e. without port number, is not sufficient
Log file analyzer
Overview
There are several log file analyzers out there. Three popular ones of which I have heard and that are also available from Debian are:
- awstats
- analog
- webalizer
An interesting comparison table can be found at
http://awstats.sourceforge.net/docs/awstats_compare.html
After taking a short look at each package I discovered that both analog and webalizer have not seen any developer activities for quite some time now. The last analog release happened in December 2004, the last webalizer release in April 2002.
One could argue that log file formats have been standardized a long time ago and that these packages are stable and do not need any further development. And yet - for some strange reason, if I have to choose between old and supposedly stable software, and newer, more actively developed and supported software, I tend to choose the newer alternative.
In short, I use awstats, whose latest release (as of writing this section) has happened in January 2007.
Debian package
The Debian package to install is
awstats
Configuration
The following configuration steps use my model domain herzbube.ch. Other sites/domains can be configured in the same way.
Copy the configuration example:
cp /usr/share/doc/awstats/examples/awstats.model.conf.gz /etc/awstats/awstats.www.herzbube.ch.conf.gz gunzip /etc/awstats/awstats.www.herzbube.ch.conf.gz
Modify the awstats.www.herzbube.ch.conf file:
LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/herzbube.ch/access.log* |" SiteDomain="www.herzbube.ch" HostAliases="REGEX[.+\.herzbube.ch] herzbube.ch 127.0.0.1 localhost" AllowToUpdateStatsFromBrowser=0 AllowFullYearView=3
Notes:
-
AllowToUpdateStatsFromBrowseris disabled because statistics updates are triggered automatically by cron
Next we need to modify the Apache configuration to contain the following (I place this in /etc/apache2/conf.d/pelargir.conf):
<Directory /usr/share/awstats> Options None AllowOverride None Order allow,deny Allow from all </Directory> <Directory /usr/share/doc/awstats/examples> Options None AllowOverride None Order allow,deny Allow from all </Directory> Alias /awstatscss/ /usr/share/doc/awstats/examples/css/ Alias /icon/ /usr/share/awstats/icon/
Note: The actual awstats.pl script is located in /usr/lib/cgi-bin and must be executed by the web server from there. If you have not yet configured the directory for CGI execution, you must do so now. See further up for a section that describes how to do this.
Statistics update
The awstats package includes the following cron file
/etc/cron.d/awstats
which contains an entry that triggers the awstats statistics update process for all sites that have a configuration file with a name that matches the following pattern:
/etc/awstats/awstats.*conf
By default the statistics update may also be triggered manually by the user when he accesses awstats.pl via web browser. However, I have disabled this by setting AllowToUpdateStatsFromBrowser to 0 in /etc/awstats/awstats.conf.
Permissions
The Debian package for Apache installs a configuration file for the logrotate package: The default file content at the time of writing looks like this:
pelargir:~# cat /etc/logrotate.d/apache2
/var/log/apache2/*.log {
weekly
missingok
rotate 52
compress
delaycompress
notifempty
create 640 root adm
sharedscripts
postrotate
/etc/init.d/apache2 reload > /dev/null
endscript
}
The above logrotate configuration contains the statement
create 640 root adm
This statement makes sure that after rotation the log file is re-created with restricted permissions, with the intent to protect the log file (which might contain sensitive information) from prying eyes. Unfortunately, those restricted permissions also prevent the awstats statistics update script from accessing the rotated log files, because the update script always runs as the user www-data:
- When the user manually triggers a statistics update via browser, the update script runs using the permissions of the web server
- When cron triggers the update, the cron config file
/etc/cron.d/awstatsspecifies to also run the update as userwww-data
This problem can be solved in several ways
- Add user
www-datato groupadm. The drawback of this method is thatwww-datacan now access every file that belongs to that group, which might be more than is intended - Change the
logrotatepermissions to 644. This also requires that the permissions for subfolders of/var/log/apache2are changed to 644 so that theawstatsupdate script can peer into those folders. The drawback of this method is that now everybody who has access to the system can read the web server logs. Also the changes might be reverted by a future update of theapachepackage. - Change the
awstatscron script so that cron runs the update script as root. The drawback of this method is that the update script now has write access to the entire system. Also the change might be reverted by a future update of theawstatspackage.
For the moment, I have decided to add www-data to the adm group, because this seems to be the change with the least impact.
