Collectd

collectd is a daemon collecting system and application performance metrics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files.

These days I am not using collectd all the time anymore. Instead I only install it temporarily if I need to diagnose a problem over a time period. The information on this page is likely to be no longer 100% accurate.

Debian packages

collectd

Note: Some plugins require libraries and/or daemons provided by separate packages. These packages are installed through a "Recommends" dependency of the main collectd package. This is explained in /usr/share/doc/collectd/README.Debian.plugins.

References

collectd Website: http://collectd.org/
configuration: man collectd.conf
plugins: some plugins have a small separate man page; try man collectd-<plugin>
collectd wiki: Also has interesting information, notably the table of plugins lists some plugins that are not documented on man pages. http://collectd.org/wiki/index.php/Table_of_Plugins

Configuration

Overview

The configuration is stored in

/etc/collectd/collectd.conf

Global options

the default settings seem to be OK
default interval to query plugins = 10 seconds
default data directory = /var/lib/collectd
default directory to load plugins from = /usr/lib/collectd

Active plugins

Currently the following plugins are in use:

apache: Apache web server statistics
cpu: CPU load; Note: this is not documented on any man page
cpufreq: CPU frequency
df: Filesystem usage
entropy: Entropy available on the system
interface: Traffic on network interfaces
irq: Triggered IRQs
load: Load averages (same as printed by the w command line utility)
memory: Memory usage
ntpd: NTP statistics (what statistics, exactly?)
processes: Statistics for selected processes
rrdtool: Stores statistics in RRD files; Note: requires separate package rrdtool
swap: Swap filesystem usage
syslog: Lets collectd send its output to syslog
users: Number of users

Interesting plugins

The following plugins look interesting, but I have not yet got around to find out more about them. Here are some general notes:

bind
- DNS statistics obtained by querying the BIND name server
- The name server must first be configured to provide the required statistics
- For details see http://collectd.org/wiki/index.php/Plugin:BIND
email
- see man page collectd-email
- opens a socket that can be used for inserting email statistics into the collectd system
- can be used with the SpamAssassin plugin Mail::SpamAssassin::Plugin::Collectd, and others
iptables
- see man page collectd.conf
mbmon
- Hardware monitoring without kernel support (seems to have some sort of direct access to the motherboard sensors)
- requires separate package mbmon
mysql : MySQL database server statistics
netlink
- network statistics acquired directly from the kernel
- provides far more details than the interface plugin
postgresql
- See http://collectd.org/wiki/index.php/Plugin:PostgreSQL
sensors
- collects statistics provided by the package lm_sensors
tail
- can be used to collect statistics from log files

Unused plugins

The purpose of the following plugins is either unknown, and/or the plugin is of no interest to me. I list them for completeness sake because I found a reference to them in collectd.conf:

apcups: apcupsd statistics (a daemon for controlling APC UPSes)
ascent: Statistics about an Ascent server (a free server for the "World of Warcraft" game)
battery: Battery statistics
conntrack: ?
contextswitch: ?
csv: CSV statistics
curl: ?
curl_json: ?
curl_xml: ?
dbi: ?
disk: Hard drive usage
dns: DNS statistics, obtained by monitoring network traffic
exec: forks off an executable either to receive values or to dispatch notifications to the outside world; see man collectd-exec
ipmi: ?
ipvs: ?
java: ?
libvirt: Collects CPU, disk and network load for virtualized guests on the machine; uses libvirt (http://libvirt.org)
madwifi: ?
memcachec: ?
memcached: memcached statistics (cache utilization, memory and bandwidth used)
multimeter: ?
network: ? (see man collectd.conf)
nfs: ?
nginx: nginx statistics (an HTTP and mail server/proxy)
notify-desktop: ?
notify-email: ?
nut: collect data from UPSes
olsrd: ?
openvpn: ?
perl: embeds a Perl interpreter into collectd so that plugins written in perl can be used; see man collectd-perl
pinba: ?
powerdns: statistics for PowerDNS nameserver and/or PowerDNS recursor
protocols: ?
python: ?
rrdcached: ?
serial: ?
snmp: SNMP statistics; see man collectd-snmp
table: ?
teamspeak2: teamspeak2 server statistics
ted: ?
thermal: ?
tokyotyrant: ?
unixsock: ? (see man collectd.conf)
uptime: ?
uuid: causes the hostname to be taken from the machine's UUID, which is usually taken from the machine's BIOS; this is useful if the machine is running in a virtual environment such as Xen)
vserver: ?
wireless: ?
write_http: ?

Disabled plugins

Although I would have liked to use the following plugins, I had to disable them for various reasons.

hddtemp

Hard disk temperature; Note: requires daemon installed by separate package hddtemp
produces segmentation fault

ping

Connection to hosts
produces output like the following, and crashes the collectd daemon

collectd: liboping.c:168: ping_timeval_sub: Assertion `(res->tv_sec > 0) || ((res->tv_sec == 0) && (res->tv_usec > 0))' failed.

tcpconns

TCP connections
produces segmentation fault

vmem

Virtual memory usage
produces segmentation fault

Plugin configuration

Some plugins require special configuration:

apache
- URL http://localhost/server-status?auto
df
- Filter out special filesystems (e.g. /dev) by specifying a filesystem type filter
- FSType "ext3"
dns
- Interface "eth1"
- use the interface that points to the Internet since I currently do not have my own DNS service
mysql
- Host "localhost"
- User collectd
- Password secret
- the database user collectd is generated automatically when the Debian package is installed
- the database user only has very general privileges
- the password is randomly generated
ping
- Host "www.google.ch"
- Host "alcarondas"
- TTL 255
processes
- Process "/usr/sbin/mysqld"
- Process "/usr/sbin/apache2"
- Process "/usr/sbin/slapd"
- Process "/usr/sbin/smbd"
- Process "/usr/sbin/nscd"
- Process "/usr/sbin/named"
rrdtool
- DataDir "/var/lib/collectd/rrd"
syslog
- LogLevel warning
- this is essential! the default log level is "info", which produces megatons of output and creates multi-megabyte sized logcheck digests
vmem
- Verbose true

Access to collected data

Overview

collectd is able to write data to CSV (comma separated list) and RRD (round robin database - see chapter further down) files. However, it does not create graphs from these files. This task is delegated to separate scripts and packages. The collectd package provides a number of such scripts/packages in

/usr/share/doc/collectd/examples

See the file /usr/share/doc/collectd/README.Debian.gz for details.

In my experience, though, these scripts do not work properly, or at least not without more work than I care to invest. For this reason, I am currently using a manually installed instance of collectd-web, a modern web interface that pretty much "just works".

collectd-web

At the time of writing there is no Debian package for collectd-web, so it needs to be manually installed:

cd /var/www
git clone https://github.com/httpdss/collectd-web.git

The following general snippet in /etc/apache2/conf-enabled/000-pelargir.conf makes collectd-web available on all virtual hosts:

# ============================================================
# Settings for collectd-web
# ============================================================
Alias /collectd-web/ /var/www/collectd-web/
<Directory /var/www/collectd-web/>
  Require all granted
  Options Indexes FollowSymLinks MultiViews
  AllowOverride all
</Directory>

And that's it - virtually zero-conf on the part of collectd-web.

collectd2html.pl

When executed, collectd2html.pl generates a directory with graphs and a static HTML file which embeds those graphs.

In my tests, collectd2html.pl only produced empty graphs.

collection.cgi

collection.cgi generates graphs on the fly. It requires the following perl Debian packages to be installed:

librrds-perl
liburi-perl
libhtml-parser-perl

Installation:

cp /usr/share/doc/collectd/examples/collection.cgi /usr/lib/cgi-bin
chmod +x /usr/lib/cgi-bin/collection.cgi

Access:

http://pelargir/cgi-bin/collection.cgi

collection3

collection3 is the successor for collection.cgi. collection3 is an entire small package that consists of multiple files organised in several directories in a typical UNIX style (etc, bin, lib, ...)

The following perl Debian packages need to be installed:

librrds-perl
libconfig-general-perl
libhtml-parser-perl (provides HTML::Entities)
libregexp-common-perl

Installation:

cp -Rp /usr/share/doc/collectd/examples/collection3 /usr/lib/cgi-bin/collection3
mv /usr/lib/cgi-bin/collection3/etc /etc/collection3
ln -s /etc/collection3 /usr/lib/cgi-bin/collection3/etc

Although the example files provided by the collectd package have been correctly set up when I installed them the first time, the following things should be checked again on each subsequent update:

only files in /usr/lib/cgi-bin/collection3/bin may have the executable bit set
only files in /usr/lib/cgi-bin/collection3/bin may be be served by apache "as is"; all other directories should contain a .htaccess file that provides some restrictions
- etc/.htaccess und lib/.htaccess should look like this

deny from all

- share/.htaccess should look like this

Options -ExecCGI
SetHandler none

Note: For the .htaccess files to take effect, /etc/apache2/conf.d/pelargir.conf must say "AllowOverride All" for the directory /usr/lib/cgi-bin

Access:

http://pelargir.herzbube.ch/cgi-bin/collection3/bin/

`hddtemp`

Debian packages

hddtemp

References

hddtemp Website: http://www.guzu.net/linux/hddtemp.php

Configuration

DebConf questions:

Do you want /usr/sbin/hddtemp to be installed SUID root? = no
Interval between two checks = 10
Do you want to start the hddtemp daemon on startup? = yes
Interface to listen on = 127.0.0.1
Port to listen on = 7634

RRDtool

Debian packages

rrdtool

References

Homepage: http://oss.oetiker.ch/rrdtool/
Tutorial: man rrdtutorial

Glossary

RRD: Round Robin Database. Is represented by a file
DS: Data Source. An RRD may contain 1-n DS
DST: Data Source Type (e.g. COUNTER)
RRA: Round Robin Archive. An RRD may contain 1-n RRA
PDP: Primary Data Point. An entry inside an RRA which has the original value obtained from a DS
CDP: Consolidated Data Point. An entry inside an RRA which is calculated by processing some PDPs
CF: Consolidating Function. Processes PDPs to calculate a CDP

Creating a database

rrdtool create test.rrd                \
            --start 920804400          \
            DS:speed:COUNTER:600:U:U   \
            RRA:AVERAGE:0.5:1:24       \
            RRA:AVERAGE:0.5:6:10

creates database in file test.rrd
starts the database at time_t 920804400 (= March 7, 1999)
- alternative ways to specify dates (using the "at-style", see "man rrdfetch" for details)
  - keywords referring to a specific date/time: now, yesterday, today, tomorrow, midnight, noon, teatime
  - keywords referring to amounts of time: years, months, weeks, days, hours, minutes, seconds
  - March 8 1999
  - 23:59 31.12.1999
  - 19970703 12:45
  - noon yesterday -3hours (same as "9am-1day")
  - -5h45min (same as "-6h+15min")
database contains 1 data source (DS)
- data source is named "speed" (max. 19 characters)
- data source type (DST) is "counter"
  - possible DSTs: gauge, counter, derive, absolute, compute
  - DST determines remaining arguments of the data source entry
- data source has a "heartbeat" of 600 seconds (5 minutes)
  - the heartbeat defines the ***maximum*** number of seconds that 2 samples may be apart before a value of *UNKNOWN* is assumed
  - if during a single heartbeat, multiple samples are fed into the database, an average rate is calculated from these samples
- we don't expect values to be in a specific range
  - U = "unknown"
  - range is given as minimum:maximum
  - if a min/max value is specified and the actual value of the data source is outside of that range, the value is assumed to be *UNKNOWN*
database contains 2 round robin archives (RRA)
- every RRA stores a number of values for ***each*** DS
- both RRAs in the example are fed one value every 300 seconds
  - 300 seconds because the "--step" option is omitted; this option defaults to 300 seconds
  - the value fed into the RRA is called "primary data point" (PDP)
  - the PDP is the average of all the DS samples fed into the database since the time of the last PDP
- both RRAs also employ the "consolidating function" (CF) AVERAGE to calculate and store a "consolidated data point" (CDP) alongside the PDP
  - possible CFs are: AVERAGE, MIN, MAX, LAST
  - CFs use PDPs (not DS samples) as their parameters
  - remaining arguments are the same for all CFs
- for both CFs, the percentage of allowed *UNKNOWN* values is 0.5
  - if a CF gets more PDPs with an *UNKNOWN* value than what is allowed, the resulting CDP will be *UNKNOWN*
- the first AVERAGE function uses 1 PDP, the second 6 PDPs to calculate their respective CDP
- the first RRA stores 24 values, the second RRA stores 10 values

Writing to a database

rrdtool update test.rrd 920804700:12345 920805000:12357 N:12363
[...]

updates database in file test.rrd
multiple (3) values are fed into the database
a value argument consists of one timestamp and 1-n values ("timestamp:value[:value...]")
- the number of values must match the number of data sources
- in the example, we see that the database contains only 1 data source
- in the example, two values have a time_t timestamp and one value has the "now" timestamp
- the timestamp may be specified using the at-style by using "@" instead of ":" as the delimiter

Reading from a database

rrdtool fetch test.rrd AVERAGE --start 920804400 --end 920809200

Output (as of RRDtool 1.2.0):
                          speed

 920804700: nan
 920805000: 4.0000000000e-02
 920805300: 2.0000000000e-02
 920805600: 0.0000000000e+00
 920805900: 0.0000000000e+00
 920806200: 3.3333333333e-02
 920806500: 3.3333333333e-02
 920806800: 3.3333333333e-02
 920807100: 2.0000000000e-02
 920807400: 2.0000000000e-02
 920807700: 2.0000000000e-02
 920808000: 1.3333333333e-02
 920808300: 1.6666666667e-02
 920808600: 6.6666666667e-03
 920808900: 3.3333333333e-03
 920809200: nan

prints values every 300 seconds
- 300 seconds because the "--resolution" option is omitted, in which case rrdtool selects the RRA with the finest resolution
- because we omitted the "--step" option when the database was created, both RRAs have a resolution of 300 seconds (the default for "--step")

Collectd

Contents

Debian packages

References

Configuration

Overview

Active plugins

Interesting plugins

Unused plugins

Disabled plugins

Plugin configuration

Access to collected data

Overview

collectd-web

collectd2html.pl

collection.cgi

collection3

`hddtemp`

Debian packages

References

Configuration

RRDtool

Debian packages

References

Glossary

Creating a database

Writing to a database

Reading from a database

Navigation menu

Collectd

Debian packages

References

Configuration

Overview

Active plugins

Interesting plugins

Unused plugins

Disabled plugins

Plugin configuration

Access to collected data

Overview

collectd-web

collectd2html.pl

collection.cgi

collection3

hddtemp

Debian packages

References

Configuration

RRDtool

Debian packages

References

Glossary

Creating a database

Writing to a database

Reading from a database

Navigation menu

Search

`hddtemp`