Spam
From HerzbubeWiki
Overview
This page discusses my experiences with the Spam problem.
Mail server configuration
Details about my mail server configuration can be found on these pages:
Spam statistics
I did not yet get around to writing a script that collects data from my mailbox so that I can automatically generate spam statistics. I have therefore decided that whenever I clean out my spam folder, I will note down details about the state of affairs in the following table. Each line contains information about the period of time that has passed since the date of the previous entry.
| Date | Days elapsed | Spam messages received | Messages/day | Correctly classified by SpamAssassin | Manually trained | False positives | Notes |
|---|---|---|---|---|---|---|---|
| 04.06.2008 | n/a | 46000 | n/a | n/a | n/a | - | - |
| 12.08.2008 | 70 | 47365 | 677 | 46354 (97.9%) | 1011 (2.1%) | - | - |
| 30.09.2008 | 49 | 39099 | 798 | 38222 (97.8%) | 877 (2.2%) | - | - |
| 07.12.2008 | 68 | 45088 | 663 | 44177 (98.0%) | 911 (2.0%) | - | For a couple of weeks, the daily amount of spam had decreased significantly. I guess I have been experiencing the direct result of web hoster McColo being taken off the net. Unfortunately, the rate has been getting back to "normal" (see this story about the spammers' backup plan). |
| 12.02.2009 | 67 | 45624 | 681 | 44733 (98.0%) | 891 (2.0%) | - | - |
| 23.04.2009 | 70 | 47366 | 677 | 46254 (97.7%) | 1112 (2.3%) | - | Didn't train for the last 40 days (while on freighter travel) |
| 10.06.2009 | 48 | 49481 | 1031 | 47965 (96.9%) | 1516 (3.1%) | - | I have trained often, even though I have been travelling, but still the rating of unrecognized spam has gone up and is, in fact, worse than during the previous period where I did no training at all. Two possible reasons for this are:
On a side note: I pruned the auto-whitelist database, which had grown to massive size over the years, but this should not have had an influence on the number of unrecognized spam. |
| 30.07.2009 | 50 | 58462 | 1169 | 55520 (95.0%) | 2942 (5.0%) | - | The basic "spam message per day" ratio has increased again, but what is even worse: More spam than ever has passed by the filter, the average is now 60 spam messages per day in my inbox :-( Will this terror never end? |
| 07.09.2009 | 39 | 45899 | 1177 | 44320 (96.6%) | 1579 (3.4%) | 1 | The picture remains unchanged, but today I have finally, reluctantly, implemented greylisting. It will be interesting to see how greylisting affects this whole spam affair. An interesting number on the side: Of all the spam messages I received, 3796 (8.3%) had a recipient that contained "iana.pen". This pretty much says everything about address harvesting... Another side note: Today I added the "False positives" column because in the previous period I had one of these. In earlier periods the column says "-" because I have no reliable numbers. However, if I recall correctly, I have had only 2-3 false positives in all the time since I am using SpamAssassin (between 5 and 6 years). |
| 30.10.2009 | 53 | 5223 | 99 | 4872 (93.3%) | 351 (6.7%) | - | After almost 2 months I conclude that greylisting is the most effective anti-spam measure that I have ever seen: Implementing it reduced the message/day rating by an impressive 92%. Of all the spam that still came through, 4568 (87.5%) messages were delivered via the backup MX (virusscan.solnet.ch) which unfortunately does not implement greylisting. I have now temporarily removed the backup MX entry from my DNS configuration (and reset the greylist daemon's whitelist) - it will be very interesting to see the results of this latest experiment.
Update on the iana.pen statistics: 358 (6.9%) messages had the string "iana.pen" in the To: header. |
| 18.12.2009 | 49 | 992 | 20 | 699 (70.5%) | 293 (29.5%) | - | After another 7 weeks of running entirely on a diet of greylisting (i.e. the backup MX was turned off all the time), the numbers look even better: The message/day rating went down by another hefty 80%, if compared with the ratio of the pre-greylisting era the improvement is now over 98%!!! An interesting observation is that the effectiveness of greylisting has lowered SpamAssassin's recognition percentage. It appears that spammers who are capable of circumventing greylisting are also better with crafting "quality" spam that can fool SpamAssassin. My new goal therefore is to raise SA's recognition rate to >=95%.
iana.pen statistics: 68 (6.9%) messages had the string "iana.pen" in the To: header. |
| 10.03.2010 | 82 | 2441 | 30 | 2078 (85.1%) | 363 (14.9%) | - | In the almost 3 months since the last count, SA's recognition rate has doubled, probably due to the longer sampling period and therefore a better average. Although I figure I could improve the rate still further by tweaking SA parameters more aggressively, I do not want to risk any false positives. At present, I therefore let the matter stand as it is.
iana.pen statistics: 143 (5.9%) messages had the string "iana.pen" in the To: header. |
| 14.12.2010 | 280 | 10133 | 36 | 9554 (94.3%) | 579 (5.7%) | 4 | With 9 months this has been the longest sampling period since I started this statistics page! I'm glad to see that SA's recognition rate has further improved without effort on my side - isn't this what computers are supposed to do: Lifting the burden of work from man's shoulders? :-)
iana.pen statistics: 1563 (15.4%) messages had the string "iana.pen" in the To: header. |
| 05.01.2012 | 387 | 7393 | 19 | 6715 (90.8%) | 678 (9.2%) | - | Slightly more than a year has passed since the last sample. In this time the message/day spam rate has dropped to an all-time low. It is unclear whether the reason for this is a world-wide decrease in spam mails, or a decrease in "quality" of spam mails, i.e. fewer spam mails make it past the greylisting wall. My gut feeling is that it is the latter. Although less overall spam is good, SpamAssassin's recognition rate has dropped by almost 4%. This makes for 1.75 spam mails per day in my Inbox, which is still less than the 2.06/day average of the last sampling period (due to the low overall spam rate).
iana.pen statistics: 319 (4.3%) messages had the string "iana.pen" in the To: header. |
