[lug] Stopping the New Generation of Spam

Daniel Webb lists at danielwebb.us
Tue Dec 5 18:32:03 MST 2006


On Tue, Dec 05, 2006 at 05:27:53PM -0700, Philip Cooper wrote:

> 1. The random story still trips them up.  It is much like the Story
> spams, you know--my father left me this money in <$some-country> when
> he <$mode-of-death>.....  Random story, Story spam, word salad all
> offer enough word combinations that have no business in a real email
> that they are an easy target for a Markov filter.  

Now that I think about it, I'll bet you're right: a Markov classifier would
have no problem detecting that the message was *too* random.  I'm surprised
they haven't just started lifting paragraphs from Wikipedia or random web
sites, or using archived messages from usenet.  
 
> The one that concerns me is when they eliminate all of the words from
> the email and just send the image.  But what legitimate email is just
> a gif?  Those embarrassing x-mas party photos sent around would
> probably be jpegs. And anyone sending just a jpeg is probably in you
> whitelist explicitly or nominally in your nonspam database because you
> trained in one of their emails.  

I don't know when the last time someone sent me a legit image as gif was...
years probably.  The minute I go to the trouble of bouncing gif-attached
emails though, they'll switch to jpeg.

> They could get their images past OCR right now but they are better off
> waiting for everyone to build the wall, then they knock it down.  Gumption
> trap for sysadmin types IMHO. 

Too late.  I thoroughly check out the spam I'm getting every few months just
out of curiousity to see what techniques they're employing.  I did it a minute
ago: the last three image spams I got all had multiple anti-OCR techniques.
They practically look like a captchas:

http://danielwebb.us/tmp/anti_ocr_spam.gif
(open at your own risk, I suppose it could have trojans)

> Reasons to not use CRM114:
> 25Meg disk space per filter set.  100k users and you have an issue.
> Performance, CRM114 is super fast but I'm not a super big mailhost.

The only one for me is that it looks like it will take several hours to
understand in implement correctly (maybe half a day to do it right, I'm not
sure).  It does look good, I'll probably give it a try on a day off someday.
 
> I don't want to sound too confident.  Windows is attacked by viruses
> in large part because it is the most common system.  Linux and OSX are
> less attractive because they are relatively seldom used.  The
> popularity of Spamassassin keeps my statistical filter low on the
> malware priority list.

I think you're right.  In this case security through obscurity works.




More information about the LUG mailing list