SpamAssassin
This page is about the email spam filtering software SpamAssassin.
Debian packages
spamassassin spamc
spamd is the daemon form of SpamAssassin. It listens on 127.0.0.1, port 783.
spamc is the client for spamd.
References
Daemon overview:
/usr/share/doc/spamassassin/README.spamd.gz
Introduction to the actual SpamAssassin program:
man spamassassin
Security
As long as no user-specific configuration is required, the daemon does not have to run as root. It can be started using the option -u <username>, e.g. with user spamassassin that was created manually.
As soon as a user-specific configuration is required (e.g. because users want to train their individual Bayes database), the daemon must run as root. Reason: the daemon needs to change uid so that it can read/write the user-specific files.
If user-specific configuration is maintained via MySQL or LDAP, the daemon does not need to run as root.
Configuration
spamd
The file /etc/default/spamassassin contains some essential configuration options for the spamd daemon:
- The OPTIONS string contains command line options for spamd:
- --create-prefs = create a users preference file
- --max-children 5 = the daemon should pre-fork 5 child processes
- --helper-home-dir = external programs that spamd launches (e.g. Razor) should get the HOME environment variable set to the value of this option; if the option has no value, the value of HOME is taken from the spamc process that contacted the daemon
CRON=1
enables SpamAssassin to automatically update its rules on a daily basis
SpamAssassin
Source for information in this chapter is man spamassassin, chapter "Configuration Files".
- SpamAssassin first loads its default configuration (i.e. the factory settings) from one of several pre-defined locations. On my Debian box the directory is
/usr/share/spamassassin
- Next it uses site-specific configuration data to override previously set values. Again, a number of pre-defined locations are tried, but on my machine the configuration will be taken from
/etc/spamassassin
- Finally, individual user preferences are loaded that will again override any previously set values. The preferences are taken from the location specified on the command line, or from the following file if nothing is specified on the command line
~/.spamassassin/user_prefs
Default configuration
The base configuration is located in
/usr/share/spamassassin
The directory contains a number of files that are loaded in a pre-defined order (see man page).
Site-specific configuration
Site-specific configuration is taken from
/etc/spamassassin
Notes:
- First all files ending in
.pre
are read in lexical order. - Next all files ending in
.cf
are read, again in lexical order - The convention seems to be to load plugins in
.pre
files, and to set configuration options in.cf
files - Local changes are intended to go into the files
local.pre
andlocal.cf
.
The following important/interesting defaults apply (for details see man spamassassin, chapter "TAGGING"):
- Any existing headers beginning with "X-Spam-" are removed to prevent spammer mischief and also to avoid potential problems caused by prior invocations of SpamAssassin.
- Messages with score 5.0 or higher are classified as spam
- If an incoming message is classified as spam, instead of modifying the original message, SpamAssassin will create a new report message and attach the original message as a message/rfc822 MIME part (ensuring the original message is completely preserved and easier to recover)
- The new report message inherits some headers (if they are present) from the original spam message (e.g. From:, To:)
- A spam message gets this header: X-Spam-Flag: YES
- Although this is not documented, I had one genuine ham message that received the header
X-Spam-Flag: No
- Although this is not documented, I had one genuine ham message that received the header
- All messages (regardless of whether they are classified as spam or ham), get these headers:
- X-Spam-Checker-Version:
- X-Spam-Level:
- X-Spam-Status:
On my machine, the file local.pre
does not exist. The file local.cf
does exist, but is mostly a copy of /usr/share/spamassassin/local.cf
. The changes I made are at the bottom:
# Use the local DNS server which is set up to query DNSBL services # directly instead of going through the green.ch DNS servers. dns_server 127.0.0.1
Notes:
- In order for this to work, a local DNS server needs to be running, and it needs to be configured so as not to forward queries destined for DNSBL services to the green.ch DNS servers. See the BIND wiki page for details.
- If you see URIBL_BLOCKED in the classification report generated by SpamAssassin, then something about the DNS server is not working as expected.
Individual user preferences
Individual user preferences are loaded from the location specified on the spamassassin, sa-learn, or spamd command line (see respective manual page for details). If the location is not specified, ~/.spamassassin/user_prefs is used if it exists. SpamAssassin will create that file if it does not already exist, using /usr/share/spamassassin/user_prefs.template as a template. The regular template file on my Debian system contains only comment lines.
The ~/.spamassassin directory contains files with the following user-specific information:
- The auto-whitelist (aka automatic whitelist or AWL) database: a list that tracks scores for regular correspondents, i.e. the scores of all messages that someone has sent in the past are averaged, and the scores of any messages in the future are pushed towards that average.
- Auto-whitelisting requires the AWL SpamAssassin plugin to be active, i.e. somewhere in the configuration this command must appear:
loadplugin Mail::SpamAssassin::Plugin::AWL
. - For some time SpamAssassin had the plugin enabled by default, then it was disabled. For a while I then had the plugin enabled locally (in
/etc/spamassassin/local.pre
), but this is no longer the case. - Although in theory the plugin sounds useful, in practice I didn't have any problems with it being disabled, so at the moment it stays disabled.
- Auto-whitelisting requires the AWL SpamAssassin plugin to be active, i.e. somewhere in the configuration this command must appear:
- The Bayes database: is updated when messages are auto-learned as spam, or when explicit training with sa-learn occurs.
Plugins
Plugins are enabled in the site-specific configuration in /etc/spamassassin
. Plugins are documented in man pages, so e.g.
man Mail::SpamAssassin::Plugin::AutoLearnThreshold
The following command lists all plugins that are currently enabled:
grep -r loadplugin /etc/spamassassin/* | sed -e 's/^[^:]*://' | grep -v ^# | sed -e 's/^loadplugin //' | sort
In my case these are
Mail::SpamAssassin::Plugin::AskDNS Mail::SpamAssassin::Plugin::AutoLearnThreshold Mail::SpamAssassin::Plugin::Bayes Mail::SpamAssassin::Plugin::BodyEval Mail::SpamAssassin::Plugin::Check Mail::SpamAssassin::Plugin::DKIM Mail::SpamAssassin::Plugin::DNSEval Mail::SpamAssassin::Plugin::FreeMail Mail::SpamAssassin::Plugin::Hashcash Mail::SpamAssassin::Plugin::HeaderEval Mail::SpamAssassin::Plugin::HTMLEval Mail::SpamAssassin::Plugin::HTTPSMismatch Mail::SpamAssassin::Plugin::ImageInfo Mail::SpamAssassin::Plugin::MIMEEval Mail::SpamAssassin::Plugin::MIMEHeader Mail::SpamAssassin::Plugin::Pyzor Mail::SpamAssassin::Plugin::Razor2 Mail::SpamAssassin::Plugin::RelayEval Mail::SpamAssassin::Plugin::ReplaceTags Mail::SpamAssassin::Plugin::Rule2XSBody Mail::SpamAssassin::Plugin::SpamCop Mail::SpamAssassin::Plugin::SPF Mail::SpamAssassin::Plugin::URIDetail Mail::SpamAssassin::Plugin::URIDNSBL Mail::SpamAssassin::Plugin::URIEval Mail::SpamAssassin::Plugin::VBounce Mail::SpamAssassin::Plugin::WhiteListSubject Mail::SpamAssassin::Plugin::WLBLEval
How scoring works
Classification report
The overall score of a message is the sum of the scores of all tests that match the message when it is classified by SpamAssassin. These scores can be seen in the report that SpamAssassin generates when it classifies a message. Typically this report is part of the X-Spam-Status
header added to the classified message. Example:
Content analysis details: (4.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.1 RCVD_IN_SBL RBL: Received via a relay in Spamhaus SBL [95.214.27.225 listed in zen.spamhaus.org] 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000] 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.1 URI_HEX URI: URI hostname has long hexadecimal sequence 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.0 HTML_MESSAGE BODY: HTML included in message 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 1.0 ACCT_PHISHING_MANY Phishing for account information
Test scores
The scores assigned to a test are found in the SpamAssassin configuration. On my system, the scores are assigned in the file
/usr/share/spamassassin/50_scores.cf
The way how the score configuration works is documented in the man page for Mail::SpamAssassin::Conf
in the section "SCORING OPTIONS". In short, a score line looks like this:
score BAYES_99 0 0 3.8 3.5
Notes:
score
is the keywordBAYES_99
is the symbolic test name- There can be either 1 or 4 numbers. If there is only 1 number it is used as the score for the test in all scenarios. If there are 4 numbers they are used as the score for the test depending on the test scenario.
- Number 1: Bayes and network tests are both disabled
- Number 2: Bayes tests are disabled, network tests are enabled
- Number 3: Bayes tests are enabled, network tests are disabled
- Number 4: Bayes and network tests are both enabled
- For additional details see
man Mail::SpamAssassin::Conf
.
Bayes tests and their scores
Bayes test scores in /usr/share/spamassassin/50_scores.cf
and Bayes tests in /usr/share/spamassassin/23_bayes.cf
:
# Scores in 50_scores.cf score BAYES_00 0 0 -1.5 -1.9 score BAYES_05 0 0 -0.3 -0.5 score BAYES_20 0 0 -0.001 -0.001 score BAYES_40 0 0 -0.001 -0.001 score BAYES_50 0 0 2.0 0.8 score BAYES_60 0 0 2.5 1.5 score BAYES_80 0 0 2.7 2.0 score BAYES_95 0 0 3.2 3.0 score BAYES_99 0 0 3.8 3.5 score BAYES_999 0 0 0.2 0.2 endif # Tests in 23_byes.cf body BAYES_00 eval:check_bayes('0.00', '0.01') body BAYES_05 eval:check_bayes('0.01', '0.05') body BAYES_20 eval:check_bayes('0.05', '0.20') body BAYES_40 eval:check_bayes('0.20', '0.40') # note: tread carefully around 0.5... the Bayesian classifier # will use that for anything it's unsure about, or if it's untrained. body BAYES_50 eval:check_bayes('0.40', '0.60') body BAYES_60 eval:check_bayes('0.60', '0.80') body BAYES_80 eval:check_bayes('0.80', '0.95') body BAYES_95 eval:check_bayes('0.95', '0.99') body BAYES_99 eval:check_bayes('0.99', '1.00') #Additional rule to add more of a score to BAYES_99 FOR 99.9% to 100% body BAYES_999 eval:check_bayes('0.999', '1.00')
Notes:
- Each Bayes test is assigned a fixed score - contrary to intuition, a message that has a high probability for being spam will not automatically receive a higher score for a given test.
- As can be seen from the content of
23_byes.cf
, the way how the Bayes tests work is that each test is assigned a non-overlapping probability range. This means that a message will be matched by only one of the tests, and that test's score is then used. - The only exception is the test BAYES_999, which has an overlap with BAYES_99. So a message that matches BAYES_999 will also match BAYES_99 and will therefore receive two scores.
- The maximum score that a message can achieve from Bayes tests, using the score values from the SpamAssassin default configuration, is 3.8 + 0.2 = 4.0.
Conclusions:
- In order to be classified as spam, a message also needs to match other tests for an additional total score of at least 1.0.
- If a false-negative message already matches the BAYES_999 test, there is probably not much point in putting it through
sa-learn
anymore.
Testing
The following command can be used to run a message through SpamAssassin to see whether it would be classified as ham or spam:
cat /path/to/message-file | spamc --full --username=foo
Notes:
- The message file is a file located in a Maildir folder
--full
causes the SpamAssassin report text to be printed instead of the rewritten message (as would normally be required by an MTA such as Exim).--username
is required only if you runspamc
with a different user than the one whose configuration and Bayes database should be used for the classification.
To see just the score without a report, replace --full
with --check
:
cat /path/to/message-file | spamc --check --username=foo
Integration with Exim
Summary
Regular spam checking is done inside an ACL.
There is also an alternative that employs a special router and transport, but this is more complicated to understand and not as efficient. If you are interested in details, check out the history of this wiki page and view an older version of the page.
Delivery into junk mail folder
Regardless of which approach is used to check messages for spam content (ACL or router/transport approach), a message that has been detected to contain spam must be processed by some filter mechanism so that it is delivered into a special junk mail folder instead of to the regular inbox. This filter can be configured in a mail client (MUA), but in my setup the filter is part of the user's ~/.foward file.
Here's an excerpt from a .forward
file that illustrates the mechanism:
# Most of the time ham mail does not contain the X-Spam-Flag header. # However, I found one case where it was present and had the value # "No". Spam mail, on the other hand, always has the header with # the value "YES". The test for "Yes" is just to be on the safe side # in case SpamAssassin devs one day decide to regularize the case of # the header's value. if "${if def:h_X-Spam-Flag {def}{undef}}" is "def" and ($h_X-Spam-Flag is "YES" or $h_X-Spam-Flag is "Yes") then if $h_to: is "lonelyplanet@herzbube.ch" or $h_to: is "paypal@herzbube.ch" or $h_to: is "directories.ch@herzbube.ch" or $h_to: is "realplayer@herzbube.ch" then save Maildir/.Junk.spamtrap/ elif $h_to: contains iana.pen or $h_Envelope-to: contains iana.pen then save Maildir/.Junk.spamtrap.ianapen/ else save Maildir/.Junk.Incoming/ endif endif
ACL approach
The ACL that is being referenced by acl_smtp_data (i.e. the ACL that is executed after the DATA command and its data have been received during the SMTP dialog) must be extended with the ACL condition spam. For instance:
warn spam = <username>
The condition is available only when Exim has been compiled with the so-called exiscan patch. In Debian, you have to use the package exim4-daemon-heavy to get the patch.
Details about the operations performed by the spam condition can be looked up in the Exim docs in chapter 40.2 ("Scanning with SpamAssassin").
Essentially the condition contacts the spamd daemon (entirely leaving out spamc), providing the daemon with the message to scan. When the condition "returns" the message is still in its original form, i.e. SpamAssassin did not add any headers. Instead the spam condition sets up a number of expansion variables that can be used to add the spam headers to the message inside the ACL. Also, the condition returns true if SpamAssassin has classified the message as spam.
The following expansion variables are set up:
- $spam_score: The spam score of the message, for example “3.4” or “30.5”. This is useful for inclusion in log or reject messages.
- $spam_score_int: The spam score of the message, multiplied by ten, as an integer value. For example “34” or “305”. This is useful for numeric comparisons in conditions. This variable is special; it is saved with the message, and written to Exim's spool file. This means that it can be used during the whole life of the message on your Exim system, in particular, in routers or transports during the later delivery phase.
- $spam_bar: A string consisting of a number of “+” or “-” characters, representing the integer part of the spam score value. A spam score of 4.4 would have a $spam_bar value of “++++”. This is useful for inclusion in warning headers, since MUAs can match on such strings.
- $spam_report: A multiline text table, containing the full SpamAssassin report for the message. Useful for inclusion in headers or reject messages.
Usually the expansion variables will be used to fill some mail headers. An example that tries to "simulate" the headers added by a default SpamAssassin configuration might look like this:
# Perform classification warn set acl_m9 = ham spam = $acl_m8 set acl_m9 = spam # Add "X-Spam-Flag:" header only if message was spam warn condition = ${if {eq {$acl_m9}{spam}} {true}{false}} message = X-Spam-Flag: YES # Add additional headers regardless of message classification warn message = X-Spam-Checker-Version: ??? message = X-Spam-Level: $spam_score ($spam_bar) message = X-Spam-Status: $spam_report
How does the $acl_m8 variable get its value?
- Some ACL (probably the one executed during RCPT) performs recipient verification using verify = recipient
- This results in the execution of the verification router router_localuser_verify (discussed in the "Virtual domains" section on the Exim page)
- That verification router sets address_data = $local_part
- After execution of the verification router has finished, the ACL transfers the value of $address_data into $acl_m8 because $address_data loses its value after the verification process ends
Training the Bayes database
Automatic training
By default SpamAssassin automatically trains the Bayes database with messages that it is currently classifying. On my system, the file /usr/share/spamassassin/10_default_prefs.cf
sets the option bayes_auto_learn to 1.
Not all messages are automatically trained upon, though. The man page for Mail::SpamAssassin::Plugin::AutoLearnThreshold provides details:
- A message is automatically trained only if
- Its score is below a certain threshold (in this case it is trained as ham)
- Its score is above a certain threshold (in this case it is trained as spam)
- The score that is compared to the lower/upper thresholds does not contain certain tests
- In order to train a message as spam, the message must score at least 3 points in the header and at least 3 points in the body
The defaults for the thresholds are 0.1 (lower) and 12 (upper). On my system the file /usr/share/spamassassin/10_default_prefs.cf
contains these default values.
Manual training
Manual training is done by the sa-learn program. See the man page for details.
Basic invokation:
sa-learn --spam /path/to/mail/folder sa-learn --ham /path/to/mail/folder
To train a single message:
cat messagefile | sa-learn --ham
Notes about the training process:
- Re-training a message as ham that was previously trained as spam (and vice versa) makes SpamAssassin forget about the previous training
- 1000 messages each for ham and spam is the minium required for successful training; training less messages works, too, but the results are not satisfying
- Training more than 5000 messages does not much increase the quality of results
- Training should be done with current spam messages since spam continually changes its appearance
- The age of ham and spam messages being trained should be about the same; training old spam but new ham leads to messages with an old timestamp to be classified as spam - even if they are ham
- The quantity of ham messages being trained should be larger than the quantity of spam messages; if the quantity is less results may not be satisfying (esp. if only a few ham messages are trained)
cron
Users are given the opportunity to train their individual spam filter. A periodical cron job run in every user's context scans two pre-defined IMAP folders from which it learns ham and spam messages. The folders are
~/Maildir/.Junk/Training-ham ~/Maildir/.Junk/Training-spam
The user may place any messages she wants into these folders. Usually such messages will be false-positives and/or false-negatives.
This is the cron job definition:
root@pelargir:~# cat /etc/cron.d/pelargir-sa-learn 0 * * * * patrick /usr/local/htb/bin/htb-sa-learn.sh 0 * * * * francesca /usr/local/htb/bin/htb-sa-learn.sh
The script htb-sa-learn.sh
is part of my "herzbube's tool box", you can find it here. A script that does the same but can be run standalone is this one:
#!/bin/bash # ------------------------------------------------------------ # Arguments # None # # Exit codes # 0 = ok # 1 = this program is already running for the current user # 2 = a prerequisite could not be found # ------------------------------------------------------------ # ------------------------------------------------------------ # Initialize variables # Maildirs MAIL_DIR="Maildir" TRAINING_HAM_DIR=".Junk.Training-ham" TRAINING_SPAM_DIR=".Junk.Training-spam" TRAINED_AS_HAM_DIR=".Junk.Trained-as-ham" TRAINED_AS_SPAM_DIR=".Junk.Trained-as-spam" # Programs SA_LEARN_BIN=/usr/bin/sa-learn # Other variables LOCK_FILE="$HOME/$(basename $0).$LOGNAME.pid" # ------------------------------------------------------------ # Check if this program is already running for the current user if test -f "$LOCK_FILE"; then exit 1 fi # ------------------------------------------------------------ # Sanity checks for BIN in "$SA_LEARN_BIN" do which "$BIN" >/dev/null 2>&1 if test $? -ne 0; then echo "$BIN could not be found" exit 2 fi done # ------------------------------------------------------------ # Create lock file. From now on, do not return without removing the file echo $$ >"$LOCK_FILE" # ------------------------------------------------------------ # Process all messages for MESSAGE_TYPE in ham spam do if test "$MESSAGE_TYPE" = "ham"; then TRAINING_BASE_DIR="$HOME/$MAIL_DIR/$TRAINING_HAM_DIR" elif test "$MESSAGE_TYPE" = "spam"; then TRAINING_BASE_DIR="$HOME/$MAIL_DIR/$TRAINING_SPAM_DIR" else continue fi # Learn messages, then move them to different folder for SUB_DIR in new cur do TRAINING_DIR="$TRAINING_BASE_DIR/$SUB_DIR" if test ! -d "$TRAINING_DIR"; then echo "Training directory not found: $TRAINING_DIR" continue fi # Learn/re-learn messages echo "Learning $MESSAGE_TYPE from $TRAINING_DIR for $USER" 2>&1 | logger $SA_LEARN_BIN "--$MESSAGE_TYPE" "$TRAINING_DIR" 2>&1 | logger done done # ------------------------------------------------------------ # Cleanup rm -f "$LOCK_FILE"
cron, approach 2
The following command line fetches all email in a given mailbox directory and stores it in a file. The messages are modified with an additional "Received:" header.
/usr/bin/fetchmail -a -v -v -n -p IMAP --folder 'INBOX.aaa' -u patrick -m 'bash -c "/usr/bin/tee >/tmp/aaa.test"' localhost
The following command line delivers a mail message in a given file to a given directory. The command must be run as the user to whose mailbox delivery should take place.
cat /tmp/aaa.test | maildrop /tmp/maildroprc.aaa
The maildrop filter file looks like this, it must belong to the user for whom delivery takes place:
to "$HOME/Maildir/.aaa"
Managing the Bayes database
Backup
This command backs up a user's Bayes database:
sudo -u <username> sh -c "sa-learn --backup >/tmp/bayes.db"
Restore
This command restores a user's Bayes database from a backup previously made with the --backup
option:
sudo -u <username> sh -c "sa-learn --restore /tmp/bayes.db"
Clear
This command clears a user's Bayes database, for instance if wrong messages have been learnt over time and a re-training is needed:
sudo -u <username> sa-learn --clear
This essentially deletes the Bayes database files in ~/.spamassassin
.
Writing custom rules
References
- https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRules
- https://flylib.com/books/en/3.44.1.27/1/
- Look for inspiration in the rules defined in
/usr/share/spamassassin
Where to place custom rules?
Rules that are to be applied site-wide:
/etc/mail/spamassassin/local.cf
Rules that are to be applied only for a specific user:
~/.spamassassin/user_prefs
Important: User-specific rules in user_prefs
are ignored by spamd
, unless the allow_user_rules
option is added to local.cf
. Add this option only if you trust your users to not attempt to break into the system via a SpamAssassin exploit.
Basic rule anatomy
A rule consists of up to 3 lines of text with the following syntax:
<what> <rule-name> <rule-definition> score <rule-name> <score-definition> describe <rule-name> <description>
Notes:
- The
<what>
line assigns a rule definition to a rule.<what>
specifies what part of the email message should be examined by the rule. Example:header
indicates that the rule should examine the email message headers. These are the possible values:header
: Rule examines the message headers by checking for matching/non-matching regex, checking for header presence, evaluating Perl code or checkingReceived
headers against DNSBL.body
: Rule examines the message subject and the text of the message body by checking for matching regex or evaluating Perl code. All textual MIME parts are decoded, with HTML tags and line breaks removed.rawbody
: Rule examines the text of the message body by checking for matching regex or evaluating Perl code. All textual MIME parts are decoded, with HTML tags and line breaks retained.full
: Rule examines the undecoded message body including all MIME parts by checking for matching regex or evaluating Perl code.uri
: Rule examines the URIs in the message body by checking for matching regex.uridnsbl
: Rule examines the URIs in the message body by checking for an address in a DNS-based blacklist.meta
: The rule is a meta rule. See the section below on meta tests.
<rule-name>
gives the rule a name of your choosing.- The convention is to write rule names in uppercase.
- To distinguish local rules from the ones that are packaged with SpamAssassin, it is recommended to add a prefix such as "LOCAL_".
- Names that start with double-underscore (
__
) indicate to SpamAssassin that the rule is a sub-rule that is part of a meta rule - see the section below on meta rules. - Names that start with "T_" indicate that the rule is a test rule that should run with score 0.01. This is used mainly by the SpamAssassin developers.
<rule-definition>
contains the what the rule is actually supposed to do. The possible contents of a rule definition mostly depend on what was specified for<what>
. See more specific sections below.
- The
score
line assigns a score definition to a rule. This can be either 1 or 4 score values. For more details see the Test scores section elsewhere on this page.- A sub-rule does not (cannot?) have a score at all.
- A test rule does not need a score definition, it automatically runs with score 0.01.
- A rule with score 0 is not evaluated.
- A rule with no score definition, and that is also neither a sub-rule nor a test rule, has a default score 1.0.
- The
describe
line assigns a description text to a rule. The description text - which can consist of multiple words - will be placed into the verbose report, if verbose reports are used. The default setting in modern SpamAssassin versions is to use verbose reports for the body.
Header rule definitions
Specific header ("Subject") exists:
header FOO exists: Subject
Notes:
- Header names are case insensitive.
Specific header ("Subject") matching/not matching a regex.
header FOO Subject =~ /\btest\b/i header FOO Subject !~ /\btest\b/i
Notes:
- Only the header value is used for regex matching. (TODO: really?)
- Special header names
ALL
: Examines all headers - see example below.ToCc
: Examines both theTo:
andCc:
headers.MESSAGEID
: Examines allMessage-Id:
headers.EnvelopeFrom
: Examines the address supplied in theSMTP MAIL FROM
command if the MTA provides this information to SpamAssassin.
- The regex syntax is the Perl syntax.
- In the example
\b
indicates that a word-break (anything that is not an alphanumeric character or underscore) must exist, so the regex string "test" must be its own word and is not matched if it is just a substring. - The "i" at the end performs case insensitive regex matching.
- A header that does not exist will not match any regular expression. Add
[if-unset : STRING]
behind the regex to make the rule match against the string literalSTRING
. - Special treatment for the "From:" and "To:" headers: You can write "From:name" / "To:name" and "From:addr" / "To:addr" to parse specific parts of the header. The name is the thing that, if present, mail clients will show instead of the address. Example:
"Nice Name" ugly@address.com
.
Any header matching a regex:
header FOO ALL =~ /\btest\b/im
Notes:
- The keyword
ALL
indicates that regex matching should be applied to all headers. - The "m" at the end causes multiple regex matches to be performed, one for each header line. If you omit the "m" then just one regex match is performed over all the headers lines.
- When
ALL
is used, the entire header line including the header name is used for the regex matching. (TODO: really? it's implied by SpamAssassin wiki docs)
Evaluating Perl code:
header FOO eval:<perl-code>
Notes:
<perl-code>
is a Perl code snippet.- One example could be to invoke a funnction that is already provided by SpamAssassin. For instance,
Mail::SpamAssassin::EvalTests
provides all sorts of useful functions - check out the rules provides by SpamAssassin itself to get some inspiration. - Apparently also useful to perform check against DNS blacklists:
check_rbl()
,check_rbl_txt()
andcheck_rbl_sub()
. The details are beyond the scope of this page.
Body rule definitions
Preliminary note: Body tests are powerful but slow.
Regex string exists anywhere in the body:
body FOO /\btest\b/i rawbody FOO /\btest\b/i full FOO /\btest\b/i
Notes:
- A
body
rule performs regex matching against a body as it would be likely to appear to a person reading the message in a text-based mail client.- All textual MIME components of the message are decoded.
- HTML tags are removed.
- The message is reformatted into paragraphs (text separated by multiple newlines), and newlines within paragraphs are removed.
- The "Subject:" header is included as the first paragraph.
- A
rawbody
rule performs regex matching against a body as it would be likely to appear to a person reading the message in an HTML-based mail client.- All textual MIME components of the message are decoded.
- The message is split into lines based on the line breaks in the message.
- The "Subject:" header is not included.
- A
full
rule performs regex matching against the full text of a message.- All headers are included, along with all textual MIME components of the message body, but no decoding is performed.
- The message is split into lines based on the line breaks in the message.
- The regex matching is performed multiple times on the result of the above pre-processing procedure.
body
rule: One regex match per paragraph.rawbody
rule: One regex match per line.full
rule: One regex match per header and per message line.
Body rules can also use evaluating Perl code instead of regex matching. Any of the regex examples above can be written like this:
body FOO eval:<perl-code>
Notes:
- None.
URI rule definitions
Any URIs in the message matching a regex:
uri FOO /^mailto:spammer@spam.com$/
Notes:
- An
uri
rule performs regex matching against all URIs in the message. - SpamAssassin creates a list of http, https, ftp, mailto, javascript and file URIs and transforms bare hostnames starting with www or ftp into appropriate URIs.
Beyond the scope of this page: If the plugin Mail::SpamAssassin::Plugin::URIDNSBL
is loaded it enables writing rules with the uridnsbl
directive. This takes each URI in the message, extracts the name of the host in the URI, looks up its IP address in DNS, and then checks the IP address against a specified DNSBL. Examples may be found in SpamAssassin's own rules.
Meta rule definitions
Meta rule examples:
header __SUB_1 [...] body __SUB_2 [...] header __SUB_3 [...] # Boolean meta rule: Matches if boolean condition is true (sub-rules values are treated as booleans) meta META1 (__SUB_1 && (__SUB_2 || ! __SUB_3)) score META1 1.3
- Arithmetic meta rule: Matches if the arithmetic condition is true (sub-rules' values are 1 or 0)
meta META2 (((0.2 * __SUB_1) + (0.8 * __SUB_2) + (1.3 * __SUB_3)) > 2) score META2 1.3
Notes:
- Meta rules are rules that are boolean or arithmetic combinations of other sub-rules.
- A boolean meta rule matches when the boolean condition using sub-rules is true. You can use paranthesis and the logical operators
&&
(logical-and),||
(logical-or) and!
(logical-not). - An arithmetic meta rule matches when the arithmetic condition using sub-rules is true. The value of a sub-rule in an arithmetic meta rule is the true/false (1/0) value for whether or not the rule matched.
Versioning
TODO: Rules can be versioned, and rules can specify that they require specific SpamAssassin versions.
Checking rules syntax (aka "linting")
To verify that your custom rules are syntactically correct you can run
spamassassin --lint
This exits silently if everything was OK, otherwise it should give some useful diagnostics. Add the -D
option to get even more diagnostics output.
Activating rules
To activate the rules - even if you just want to test them with spamc
- you have to restart SpamAssassin:
systemctl restart spamassassin