AWStats
AWStats is a log file analyzer. I use it for analyzing the Apache web server log files (which is also where the software has its origins).
Debian package
The Debian package to install is
awstats
An optional package to map IP addresses to geo locations is
libgeo-ipfree-perl
AWStats configuration
General approach
The AWStats configuration is located in
/etc/awstats
The script that updates the AWStats statistics processes the main configuration file /etc/awstats/awstats.conf, and files that match this pattern:
/etc/awstats/awstats.*.conf
The idea is that every virtual host for which separate log files exist has its own configuration file. The naming scheme I adopt is this:
awstats.<vhost name>.conf
For instance,
awstats.www.herzbube.ch.conf awstats.wiki.herzbube.ch.conf [...]
The Debian way
The official method for configuring AWStats is to create a clone of the main configuration file /etc/awstats/awstats.conf for each virtual host, and to modify those few properties that differ from the defaults for that vhost. In my opinion this is a horrible approach, because it means duplicating more than 1500 lines - over and over and over... When the time comes to upgrade to a new AWStats version, you will have to update all those vhost-specific files, merging your local modifications with the upstream changes. Argh!
A much better approach is the one outlined in the Debian README (/usr/share/doc/awstats/README.Debian.gz):
- Leave the main configuration file
awstats.confalone - Create vhost-specific configuration files which on their first line include the main configuration file, then continue to specify vhost-specific settings that override the defaults in the main configuration file
- Add any settings that override the defaults in the main configuration file, but are shared among all vhosts, to
awstats.conf.local. This file is included at the end of the main configuration file.
awstats.conf.local
These settings override the defaults and are shared among all vhosts:
AllowFullYearView=3 # This setting is required so that awstat.conf can be processed # on its own, i.e. when not included from a vhost configuration # file. If included from a vhost config, the vhost will override # this setting. SiteDomain="default"
Notes:
AllowFullYearViewis set to 3, which means that stats for a full year can be viewed not only from the command line, but also when awstats is run as a CGI
Example configuration for www.herzbube.ch
The following configuration settings are in use for the www.herzbube.ch vhost. The other vhosts are all configured in the same way.
Include "/etc/awstats/awstats.conf" LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/www.herzbube.ch/access.log |" SiteDomain="www.herzbube.ch" HostAliases="herzbube.ch 82.195.228.21" DirData="/var/lib/awstats/www.herzbube.ch"
Important: Because we want to store vhost-specific data in a vhost-specific data directory, that directory must be manually created and given the same permissions as the parent folder.
mkdir /var/lib/awstats/www.herzbube.ch chown --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch chmod --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch
Apache configuration
An example with some guidelines for the Apache configuration can be found here
/usr/share/doc/awstats/examples/apache.conf
But actually we don't have to do much, because our default Apache configuration is already set up with this:
- CGI script execution is already enabled for
/usr/lib/cgi-bin - Access to
/usr/share/awstatsis already granted by default - No special options are set and no overrides are allowed for
/usr/share/awstats
So the main thing that remains to be done is protecting the statistics from unauthenticated access. Here is the configuration:
# ============================================================
# AWStats configuration
# ============================================================
# Among other things, this provides the graphics for the bar charts
Alias /awstats-icon/ /usr/share/awstats/icon/
# Restrict access to authenticated users
<Directory /usr/lib/cgi-bin>
<Files awstats.pl>
AuthName "awstats"
AuthType Basic
AuthBasicProvider ldap
Require valid-user
</Files>
</Directory>
Note: I am placing the configuration in the global /etc/apache2/conf-available/000-pelargir.conf, not in any of the vhost configurations, because AWStats is available in all vhosts.
Statistics update
cron
The awstats Debian package installs the following file:
/etc/cron.d/awstats
This file contains a crontab entry that triggers the awstats statistics update process by running this script:
/usr/share/awstats/tools/update.sh
The script processes the following files:
- The main configuration file
/etc/awstats/awstats.conf, which refers to the main Apache log file/var/log/apache2/access.log - All configuration files that match the pattern
/etc/awstats/awstats.*.conf. As explained further up, on my system these refer to virtual host log files.
The default schedule is to run the update script every 10 minutes. This is much too often for my taste, so I changed this to run every hour, at 10 minutes past the hour.
Conflict with log rotation
The problem:
- The
awstatscron job runs every hour, at 10 minutes past the hour - Log rotation is triggered by
cron.daily, which runs at 06:25 in the morning - This means that between 06:10 and 06:25 there is 15 minute long gap that is never seen by the
awstatscron job
Example:
- The
awstatscron job runs at 06:10. It processesaccess.log. - All log output that is produced between 06:10 and 06:25 is written to Apache's
access.log - At 06:25 log files are rotated, so the log output from the previous 15 minutes is now in
access.log.1. - The
awstatscron job runs at 07:10. It processesaccess.log. It doesn't see the stuff that is now inaccess.log.1.
An elegant solution to this problem is to run the awstats update routine before log rotation happens. The Debian package for Apache has provisions for this in its logrotate config file:
root@pelargir:~# cat /etc/logrotate.d/apache2
/var/log/apache2/*.log {
[...]
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi; \
endscript
}
The intent is that you place a script that runs the awstats update routine in this folder:
/etc/logrotate.d/httpd-prerotate
So what we do is this:
mkdir /etc/logrotate.d/httpd-prerotate ln -s /usr/share/awstats/tools/update.sh /etc/logrotate.d/httpd-prerotate/awstats
Notes:
logrotateruns as userroot, whereas theawstatscron job runs as userwww-data. This will cause problems if theawstatscron job ever needs to write to a file that has previously been created by thelogrotateupdate routine.
Manual update
The statistics update may also be triggered manually by the user when he accesses awstats.pl via web browser, but this is disabled by default in the AWStats configuration. To enable this, the setting AllowToUpdateStatsFromBrowser must be set to 1.
Permissions
Apache writes log files with restricted permissions:
-rw-r----- 1 root adm 0 Jun 10 06:25 /var/log/apache2/access.log
Unfortunately, those restricted permissions also prevent the awstats statistics update script from accessing the rotated log files, because the update script usually runs as user www-data:
- When the user manually triggers a statistics update via web browser, the update script runs using the permissions of the web server, which on Debian means as user
www-data - The
awstatscron job/etc/cron.d/awstatsspecifies to also run the update as userwww-data - The update script that is run by
logrotateruns as userroot, so this is the only case where the update script is capable of reading Apache's log file
The problem can be solved in several ways:
- Add user
www-datato groupadm. The drawback of this method is thatwww-datacan now access every file that belongs to that group, which might be more than is intended. - Change the permissions that the Apache daemon uses to create log files to 644 (instead of 640). The drawback of this method is that now everybody who has access to the system can read the web server logs. Also the changes might be reverted by a future update of the
apachepackage.- If this solution is chosen, the permissions for virtual host subfolders of
/var/log/apache2must also be changed to 644, so that theawstatsupdate script can look into those folders. - Additionally, it might be advisable to change the
logrotateconfiguration so that the rotated files also have permissions 644.
- If this solution is chosen, the permissions for virtual host subfolders of
- Change the
awstatscron script so that the update script runs as userroot. The drawback of this method is that the update script now has write access to the entire system. Also the change might be reverted by a future update of theawstatspackage.
For the moment, I have decided to add www-data to the adm group, because this seems to be the change with the least impact.
adduser www-data adm