AWStats
AWStats is a log file analyzer. I use it for analyzing the Apache web server log files (which is also where the software has its origins).
Debian package
The Debian package to install is
awstats
An optional package to map IP addresses to geo locations is
libgeo-ipfree-perl
AWStats configuration
General approach
The AWStats configuration is located in
/etc/awstats
The script that updates the AWStats statistics processes the main configuration file /etc/awstats/awstats.conf
, and files that match this pattern:
/etc/awstats/awstats.*.conf
The idea is that every virtual host for which separate log files exist has its own configuration file. The naming scheme I adopt is this:
awstats.<vhost name>.conf
For instance,
awstats.www.herzbube.ch.conf awstats.wiki.herzbube.ch.conf [...]
The Debian way
The official method for configuring AWStats is to create a clone of the main configuration file /etc/awstats/awstats.conf
for each virtual host, and to modify those few properties that differ from the defaults for that vhost. In my opinion this is a horrible approach, because it means duplicating more than 1500 lines - over and over and over... When the time comes to upgrade to a new AWStats version, you will have to update all those vhost-specific files, merging your local modifications with the upstream changes. Argh!
A much better approach is the one outlined in the Debian README (/usr/share/doc/awstats/README.Debian.gz
):
- Leave the main configuration file
awstats.conf
alone - Create vhost-specific configuration files which on their first line include the main configuration file, then continue to specify vhost-specific settings that override the defaults in the main configuration file
- Add any settings that override the defaults in the main configuration file, but are shared among all vhosts, to
awstats.conf.local
. This file is included at the end of the main configuration file.
awstats.conf.local
These settings override the defaults and are shared among all vhosts:
AllowFullYearView=3 # This setting is required so that awstat.conf can be processed # on its own, i.e. when not included from a vhost configuration # file. If included from a vhost config, the vhost will override # this setting. SiteDomain="default"
Notes:
AllowFullYearView
is set to 3, which means that stats for a full year can be viewed not only from the command line, but also when awstats is run as a CGI
Example configuration for www.herzbube.ch
The following configuration settings are in use for the www.herzbube.ch
vhost. The other vhosts are all configured in the same way.
Include "/etc/awstats/awstats.conf" LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/log/apache2/www.herzbube.ch/access.log |" SiteDomain="www.herzbube.ch" HostAliases="herzbube.ch 82.195.228.21" DirData="/var/lib/awstats/www.herzbube.ch"
Important: Because we want to store vhost-specific data in a vhost-specific data directory, that directory must be manually created and given the same permissions as the parent folder.
mkdir /var/lib/awstats/www.herzbube.ch chown --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch chmod --reference=/var/lib/awstats /var/lib/awstats/www.herzbube.ch
Apache configuration
An example with some guidelines for the Apache configuration can be found here
/usr/share/doc/awstats/examples/apache.conf
But actually we don't have to do much, because our default Apache configuration is already set up with this:
- CGI script execution is already enabled for
/usr/lib/cgi-bin
- Access to
/usr/share/awstats
is already granted by default - No special options are set and no overrides are allowed for
/usr/share/awstats
So the main thing that remains to be done is protecting the statistics from unauthenticated access. Here is the configuration:
# ============================================================ # AWStats configuration # ============================================================ # Among other things, this provides the graphics for the bar charts Alias /awstats-icon/ /usr/share/awstats/icon/ # Restrict access to authenticated users <Directory /usr/lib/cgi-bin> <Files awstats.pl> AuthName "awstats" AuthType Basic AuthBasicProvider ldap Require valid-user </Files> </Directory>
Note: I am placing the configuration in the global /etc/apache2/conf-available/000-pelargir.conf
, not in any of the vhost configurations, because AWStats is available in all vhosts.
Statistics update
cron
The awstats
Debian package installs the following file:
/etc/cron.d/awstats
This file contains a crontab
entry that triggers the awstats
statistics update process by running this script:
/usr/share/awstats/tools/update.sh
The script processes the following files:
- The main configuration file
/etc/awstats/awstats.conf
, which refers to the main Apache log file/var/log/apache2/access.log
- All configuration files that match the pattern
/etc/awstats/awstats.*.conf
. As explained further up, on my system these refer to virtual host log files.
The default schedule is to run the update script every 10 minutes. This is much too often for my taste, so I changed this to run every hour, at 10 minutes past the hour.
Conflict with log rotation
The problem:
- The
awstats
cron job runs every hour, at 10 minutes past the hour - Log rotation is triggered by
cron.daily
, which runs at 06:25 in the morning - This means that between 06:10 and 06:25 there is 15 minute long gap that is never seen by the
awstats
cron job
Example:
- The
awstats
cron job runs at 06:10. It processesaccess.log
. - All log output that is produced between 06:10 and 06:25 is written to Apache's
access.log
- At 06:25 log files are rotated, so the log output from the previous 15 minutes is now in
access.log.1
. - The
awstats
cron job runs at 07:10. It processesaccess.log
. It doesn't see the stuff that is now inaccess.log.1
.
An elegant solution to this problem is to run the awstats
update routine before log rotation happens. The Debian package for Apache has provisions for this in its logrotate
config file:
root@pelargir:~# cat /etc/logrotate.d/apache2 /var/log/apache2/*.log { [...] prerotate if [ -d /etc/logrotate.d/httpd-prerotate ]; then \ run-parts /etc/logrotate.d/httpd-prerotate; \ fi; \ endscript }
The intent is that you place a script that runs the awstats
update routine in this folder:
/etc/logrotate.d/httpd-prerotate
So what we do is this:
mkdir /etc/logrotate.d/httpd-prerotate ln -s /usr/share/awstats/tools/update.sh /etc/logrotate.d/httpd-prerotate/awstats
Notes:
logrotate
runs as userroot
, whereas theawstats
cron job runs as userwww-data
. This will cause problems if theawstats
cron job ever needs to write to a file that has previously been created by thelogrotate
update routine.
Manual update
The statistics update may also be triggered manually by the user when he accesses awstats.pl
via web browser, but this is disabled by default in the AWStats configuration. To enable this, the setting AllowToUpdateStatsFromBrowser
must be set to 1.
Permissions
Apache writes log files with restricted permissions:
-rw-r----- 1 root adm 0 Jun 10 06:25 /var/log/apache2/access.log
Unfortunately, those restricted permissions also prevent the awstats
statistics update script from accessing the rotated log files, because the update script usually runs as user www-data
:
- When the user manually triggers a statistics update via web browser, the update script runs using the permissions of the web server, which on Debian means as user
www-data
- The
awstats
cron job/etc/cron.d/awstats
specifies to also run the update as userwww-data
- The update script that is run by
logrotate
runs as userroot
, so this is the only case where the update script is capable of reading Apache's log file
The problem can be solved in several ways:
- Add user
www-data
to groupadm
. The drawback of this method is thatwww-data
can now access every file that belongs to that group, which might be more than is intended. - Change the permissions that the Apache daemon uses to create log files to 644 (instead of 640). The drawback of this method is that now everybody who has access to the system can read the web server logs. Also the changes might be reverted by a future update of the
apache
package.- If this solution is chosen, the permissions for virtual host subfolders of
/var/log/apache2
must also be changed to 644, so that theawstats
update script can look into those folders. - Additionally, it might be advisable to change the
logrotate
configuration so that the rotated files also have permissions 644.
- If this solution is chosen, the permissions for virtual host subfolders of
- Change the
awstats
cron script so that the update script runs as userroot
. The drawback of this method is that the update script now has write access to the entire system. Also the change might be reverted by a future update of theawstats
package.
For the moment, I have decided to add www-data
to the adm
group, because this seems to be the change with the least impact.
adduser www-data adm