AWStats

From HerzbubeWiki
Jump to: navigation, search

AWStats is a log file analyzer. I use it for analyzing the Apache web server log files (which is also where the software has its origins).


Debian package

The Debian package to install is

awstats

An optional package to map IP addresses to geo locations is

libgeo-ipfree-perl


AWStats configuration

General approach

The AWStats configuration is located in

/etc/awstats

The script that updates the AWStats statistics processes the main configuration file /etc/awstats/awstats.conf, and files that match this pattern:

/etc/awstats/awstats.*.conf

The idea is that every virtual host for which separate log files exist has its own configuration file. The naming scheme I adopt is this:

awstats.<vhost name>.conf

For instance,

awstats.www.herzbube.ch.conf
awstats.wiki.herzbube.ch.conf
[...]


The Debian way

The official method for configuring AWStats is to create a clone of the main configuration file /etc/awstats/awstats.conf for each virtual host, and to modify those few properties that differ from the defaults for that vhost. In my opinion this is a horrible approach, because it means duplicating more than 1500 lines - over and over and over... When the time comes to upgrade to a new AWStats version, you will have to update all those vhost-specific files, merging your local modifications with the upstream changes. Argh!


A much better approach is the one outlined in the Debian README (/usr/share/doc/awstats/README.Debian.gz):

  • Leave the main configuration file awstats.conf alone
  • Create vhost-specific configuration files which on their first line include the main configuration file, then continue to specify vhost-specific settings that override the defaults in the main configuration file
  • Add any settings that override the defaults in the main configuration file, but are shared among all vhosts, to awstats.conf.local. This file is included at the end of the main configuration file.


awstats.conf.local

These settings override the defaults and are shared among all vhosts:

AllowFullYearView=3

# This setting is required so that awstat.conf can be processed
# on its own, i.e. when not included from a vhost configuration
# file. If included from a vhost config, the vhost will override
# this setting.
SiteDomain="default"

Notes:

  • AllowFullYearView is set to 3, which means that stats for a full year can be viewed not only from the command line, but also when awstats is run as a CGI


Example configuration for www.herzbube.ch

The following configuration settings are in use for the www.herzbube.ch vhost. The other vhosts are all configured in the same way.

Include "/etc/awstats/awstats.conf"

LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/www.herzbube.ch/access.log |"
SiteDomain="www.herzbube.ch"
HostAliases="herzbube.ch 82.195.228.21"
DirData="/var/lib/awstats/www.herzbube.ch"


Important: Because we want to store vhost-specific data in a vhost-specific data directory, that directory must be manually created and given the same permissions as the parent folder.

mkdir /var/lib/awstats/www.herzbube.ch
chown --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch
chmod --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch


Apache configuration

An example with some guidelines for the Apache configuration can be found here

/usr/share/doc/awstats/examples/apache.conf


But actually we don't have to do much, because our default Apache configuration is already set up with this:

  • CGI script execution is already enabled for /usr/lib/cgi-bin
  • Access to /usr/share/awstats is already granted by default
  • No special options are set and no overrides are allowed for /usr/share/awstats


So the main thing that remains to be done is protecting the statistics from unauthenticated access. Here is the configuration:

# ============================================================
# AWStats configuration
# ============================================================
# Among other things, this provides the graphics for the bar charts
Alias /awstats-icon/ /usr/share/awstats/icon/

# Restrict access to authenticated users
<Directory /usr/lib/cgi-bin>
  <Files awstats.pl>
    AuthName "awstats"
    AuthType Basic
    AuthBasicProvider ldap
    Require valid-user
  </Files>
</Directory>


Note: I am placing the configuration in the global /etc/apache2/conf-available/000-pelargir.conf, not in any of the vhost configurations, because AWStats is available in all vhosts.


Statistics update

cron

The awstats Debian package installs the following file:

/etc/cron.d/awstats

This file contains a crontab entry that triggers the awstats statistics update process by running this script:

/usr/share/awstats/tools/update.sh

The script processes the following files:

  • The main configuration file /etc/awstats/awstats.conf, which refers to the main Apache log file /var/log/apache2/access.log
  • All configuration files that match the pattern /etc/awstats/awstats.*.conf. As explained further up, on my system these refer to virtual host log files.


The default schedule is to run the update script every 10 minutes. This is much too often for my taste, so I changed this to run every hour, at 10 minutes past the hour.


Conflict with log rotation

The problem:

  • The awstats cron job runs every hour, at 10 minutes past the hour
  • Log rotation is triggered by cron.daily, which runs at 06:25 in the morning
  • This means that between 06:10 and 06:25 there is 15 minute long gap that is never seen by the awstats cron job


Example:

  • The awstats cron job runs at 06:10. It processes access.log.
  • All log output that is produced between 06:10 and 06:25 is written to Apache's access.log
  • At 06:25 log files are rotated, so the log output from the previous 15 minutes is now in access.log.1.
  • The awstats cron job runs at 07:10. It processes access.log. It doesn't see the stuff that is now in access.log.1.


An elegant solution to this problem is to run the awstats update routine before log rotation happens. The Debian package for Apache has provisions for this in its logrotate config file:

root@pelargir:~# cat /etc/logrotate.d/apache2 
/var/log/apache2/*.log {
        [...]
	prerotate
		if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
			run-parts /etc/logrotate.d/httpd-prerotate; \
		fi; \
	endscript
}

The intent is that you place a script that runs the awstats update routine in this folder:

/etc/logrotate.d/httpd-prerotate


So what we do is this:

mkdir /etc/logrotate.d/httpd-prerotate
ln -s /usr/share/awstats/tools/update.sh /etc/logrotate.d/httpd-prerotate/awstats

Notes:

  • logrotate runs as user root, whereas the awstats cron job runs as user www-data. This will cause problems if the awstats cron job ever needs to write to a file that has previously been created by the logrotate update routine.


Manual update

The statistics update may also be triggered manually by the user when he accesses awstats.pl via web browser, but this is disabled by default in the AWStats configuration. To enable this, the setting AllowToUpdateStatsFromBrowser must be set to 1.


Permissions

Apache writes log files with restricted permissions:

-rw-r----- 1 root adm 0 Jun 10 06:25 /var/log/apache2/access.log


Unfortunately, those restricted permissions also prevent the awstats statistics update script from accessing the rotated log files, because the update script usually runs as user www-data:

  • When the user manually triggers a statistics update via web browser, the update script runs using the permissions of the web server, which on Debian means as user www-data
  • The awstats cron job /etc/cron.d/awstats specifies to also run the update as user www-data
  • The update script that is run by logrotate runs as user root, so this is the only case where the update script is capable of reading Apache's log file


The problem can be solved in several ways:

  • Add user www-data to group adm. The drawback of this method is that www-data can now access every file that belongs to that group, which might be more than is intended.
  • Change the permissions that the Apache daemon uses to create log files to 644 (instead of 640). The drawback of this method is that now everybody who has access to the system can read the web server logs. Also the changes might be reverted by a future update of the apache package.
    • If this solution is chosen, the permissions for virtual host subfolders of /var/log/apache2 must also be changed to 644, so that the awstats update script can look into those folders.
    • Additionally, it might be advisable to change the logrotate configuration so that the rotated files also have permissions 644.
  • Change the awstats cron script so that the update script runs as user root. The drawback of this method is that the update script now has write access to the entire system. Also the change might be reverted by a future update of the awstats package.


For the moment, I have decided to add www-data to the adm group, because this seems to be the change with the least impact.

adduser www-data adm