[lug] Stopping the New Generation of Spam
Daniel Webb
lists at danielwebb.us
Tue Dec 5 18:32:03 MST 2006
On Tue, Dec 05, 2006 at 05:27:53PM -0700, Philip Cooper wrote:
> 1. The random story still trips them up. It is much like the Story
> spams, you know--my father left me this money in <$some-country> when
> he <$mode-of-death>..... Random story, Story spam, word salad all
> offer enough word combinations that have no business in a real email
> that they are an easy target for a Markov filter.
Now that I think about it, I'll bet you're right: a Markov classifier would
have no problem detecting that the message was *too* random. I'm surprised
they haven't just started lifting paragraphs from Wikipedia or random web
sites, or using archived messages from usenet.
> The one that concerns me is when they eliminate all of the words from
> the email and just send the image. But what legitimate email is just
> a gif? Those embarrassing x-mas party photos sent around would
> probably be jpegs. And anyone sending just a jpeg is probably in you
> whitelist explicitly or nominally in your nonspam database because you
> trained in one of their emails.
I don't know when the last time someone sent me a legit image as gif was...
years probably. The minute I go to the trouble of bouncing gif-attached
emails though, they'll switch to jpeg.
> They could get their images past OCR right now but they are better off
> waiting for everyone to build the wall, then they knock it down. Gumption
> trap for sysadmin types IMHO.
Too late. I thoroughly check out the spam I'm getting every few months just
out of curiousity to see what techniques they're employing. I did it a minute
ago: the last three image spams I got all had multiple anti-OCR techniques.
They practically look like a captchas:
http://danielwebb.us/tmp/anti_ocr_spam.gif
(open at your own risk, I suppose it could have trojans)
> Reasons to not use CRM114:
> 25Meg disk space per filter set. 100k users and you have an issue.
> Performance, CRM114 is super fast but I'm not a super big mailhost.
The only one for me is that it looks like it will take several hours to
understand in implement correctly (maybe half a day to do it right, I'm not
sure). It does look good, I'll probably give it a try on a day off someday.
> I don't want to sound too confident. Windows is attacked by viruses
> in large part because it is the most common system. Linux and OSX are
> less attractive because they are relatively seldom used. The
> popularity of Spamassassin keeps my statistical filter low on the
> malware priority list.
I think you're right. In this case security through obscurity works.
More information about the LUG
mailing list