Curbing Image/PDF spam : SpamAssassin


spamassassin

A lot of spam image/PDFs were slipping through my office MXs since this spamming technique has gained its popularity and it was getting really out of hands. I have decided to put an end to this madness and experimented various tactics to curb image/PDF spam. Generally, this can be achieved with spam scoring from SpamAssassin or clamav via Sanesecurity’s Phishing and Scam Signatures for ClamAV.

On this post, I will share some of the tactics that I have tried with SpamAssassin. With SpamAssassin, fighting image/PDF spam was trivial.

SpamAssassin rules

A) Built-in ruleset

TVD_PDF_FINGER01, which looks for mail matches standard pdf spam fingerprint (emails that have empty bodies
but contain PDF attachments), was added by the SpamAssassin developer. It works well by add 1.0 mark to PDF spam. However, this is too low to effectively catch PDF spam as threshold for tagging spam commonly stands at 5.0 - 10.0. Increasing the
score is a bad idea since a lot of lazy users regularly send PDF attachments with empty mail bodies, and this could lead to false positives.

B) Custom rulesets

This one goes to Ditesh as he wanted to further tighten his server by blocking attachment from stranger. I would suggest to use this ruleset with higher scoring. (Blocking is not a good idea). This custom ruleset was posted by Eric A. Hall on the SpamAssassin-Users
list recently. It uses the AWL to determine whether the sender of a binary
attachment is a stranger (Image/PDF spammers, of course, are strangers to you. ;-)). As MIMEHeader is included
by default in the SpamAssassin 3.2.x series, you can just happily add the ruleset to your local.cf.

ifplugin Mail::SpamAssassin::Plugin::MIMEHeadermimeheader  __L_C_TYPE_APP     Content-Type =~ /^application/i
mimeheader  __L_C_TYPE_IMAGE   Content-Type =~ /^image/i
mimeheader  __L_C_TYPE_AUDIO   Content-Type =~ /^audio/i
mimeheader  __L_C_TYPE_VIDEO   Content-Type =~ /^video/i
mimeheader  __L_C_TYPE_MODEL   Content-Type =~ /^model/i
meta        L_STRANGER_APP     (!AWL && __L_C_TYPE_APP)
score       L_STRANGER_APP     1.0
tflags      L_STRANGER_APP     noautolearn
priority    L_STRANGER_APP     1001 # defer till after AWL
describe    L_STRANGER_APP     Application file sent by a stranger
meta        L_STRANGER_IMAGE   (!AWL && __L_C_TYPE_IMAGE)
score       L_STRANGER_IMAGE   1.0
tflags      L_STRANGER_IMAGE   noautolearn
priority    L_STRANGER_IMAGE   1001 # defer till after AWL
describe    L_STRANGER_IMAGE   Image file sent by a stranger
meta        L_STRANGER_AUDIO   (!AWL && __L_C_TYPE_AUDIO)
score       L_STRANGER_AUDIO   1.0
tflags      L_STRANGER_AUDIO   noautolearn
priority    L_STRANGER_AUDIO   1001 # defer till after AWL
describe    L_STRANGER_AUDIO   Audio file sent by a stranger
meta        L_STRANGER_VIDEO   (!AWL && __L_C_TYPE_VIDEO)
score       L_STRANGER_VIDEO   1.0
tflags      L_STRANGER_VIDEO   noautolearn
priority    L_STRANGER_VIDEO   1001 # defer till after AWL
describe    L_STRANGER_VIDEO   Video file sent by a stranger
meta        L_STRANGER_MODEL   (!AWL && __L_C_TYPE_MODEL)
score       L_STRANGER_MODEL   1.0
tflags      L_STRANGER_MODEL   noautolearn
priority    L_STRANGER_MODEL   1001 # defer till after AWL
describe    L_STRANGER_MODEL   Model file sent by a stranger
endif

PDFInfo

Grab PDFInfo.pm and pdfinfo.cf from PDFInfo plugin site. Place pdfinfo.cf in the SpamAssassin’s configuration directory (/usr/local/etc/mail/spamassassin/) and PDFInfo.pm in the SpamAssassin plugin directory (/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Plugin/). To load the plugin, you should add loadplugin Mail::SpamAssassin::Plugin::PDFInfo to init.pre (or v310.pre). Alternatively, you could use loadplugin Mail::SpamAssassin::Plugin::PDFInfo /path/to/your/plugin for placing PDFinfo.pm file in directory other than your SpamAssassin plugin directory. With that in place, you restart your Spamassassin and verify that PDFInfo plugin was loaded properly with debug output from Spamassassin

spamassassin --lint -D

You should get similar lines as below:-

[32487] dbg: config: read file /usr/local/etc/mail/spamassassin/pdfinfo.cf
[32487] dbg: plugin: loading Mail::SpamAssassin::Plugin::PDFInfo from @INC

FuzzyOcr

I’ve installed FuzzyOcr plugin from the FreeBSD ports. /usr/ports/mail/p5-FuzzyOcr-devel/ FuzzyOcr development is recommended as stable release was way too old. It’s easy to maintain. However, manual installation is relatively easy as the tarball contains FuzzyOcr pearl module plugin, configure files and some sample test Image/PDF test mails. Just copy FuzzyOcr.cf and FuzzyOcr.words to the SpamAssassin’s configuration directory (If you installed from ports, the configuration file is located at /usr/local/share/examples/FuzzyOcr/. I created a directory in /var/db called “fuzzyocr” for all FuzzyOcr database and words list. My configuration file looks like this:-

focr_enable_image_hashing 2
focr_global_wordlist /var/db/fuzzyocr/FuzzyOcr.words
focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile, $ocrad -s 0.5 -T 0.5 $pfile
focr_digest_db /var/db/fuzzyocr/FuzzyOcr.hashdb
focr_db_hash /var/db/fuzzyocr/FuzzyOcr.db
focr_db_safe /var/db/fuzzyocr/FuzzyOcr.safe.db
focr_hashing_learn_scanned 1

Again verify if the plugin is loaded properly in spamassassin.

Other tactics

There are other tactics of fighting Image/PDF spam which I have not tried. As I’m aware of at this point of writting; PDFText and botnet plugin with patch.

CONCLUSIONS

There has been a lot of discussion/experience sharing on SpamAssassin-users and Maia-users list. One notable comment/experience (with the title : [Maia-users] PDF spam solutions) was posted by Robert LeBlanc on Maia-users list. It is comprehensive enough to give you an edge of fighting image/PDF spam. Nevertheless, new spam tactics are evolved day by day. Who knows we might be seeing M$ word / powerpoint spam soon.

White Papers for Success
A cheap hosting site does not imply a site with ineffective best domain host. Look at powweb and bluehost. Not only are they cheap, but they also are the best web hosting services in the market. Even midphase and webhosting net are a tad bit expensive than the former 2 services.

Leave a Reply