uwMike.com

I'm in Waterloo at the moment, and next available to work in September 2008.

Spam Gets Chatty

February 28th, 2005

On a lark, I checked my briefly-used GMail account to see what it had accrued in my absence. Of those unfiltered, what I noticed were a large number of spams that followed this general formula:

did you hear abot that little device for decoding all the channels that Myles got last week he says it works real good and he is watching all thes ppv movies and sporting events for nothin…LOL…i thought ya right but it actually does work. check it out if you want at this place [link] but if you dont want thats fine as you can stop by and tell us to not tell you anymore [link]

To be fair to Google, I had deliberately seeded that address to fish for 419ers, the baiting of whom was a hobby I briefly considered engaging in. Several ‘dead’ address books were signed as a wealthy Australian surgeon, with a fat portfolio ready for his retirement.

Myles And His TV For Nothin

So… I never knew anyone named Myles, and neither did my Australian alter ego. It’s an interesting approach to spam, because not only does it appear to be in compliance with the CANSPAM regulations, by including unsubscription verbiage, but it’s also actively attempting to defeat statistical analysis without including an obvious body of ‘innocent words’.

Phrases like “last week”, “real good”, and “stop by” probably give this message the green light under current filtering schemes. And frankly, I’m not sure if I want messages that look like [block of informal text + link] to be filtered, because I get a lot of those.

The Final Spam Filter

What does the Ultimate Iron-Clad Spam Filter look like? I think what it does is crawl the email for links, visit those sites, and then analyse them for spamminess. Probably even keep a certral index of ’sites that spams link to’, the inclusion of a link to which is the ultimate damnation for a message.

As I’ve written previously, blocking spams by source-IP is not an acceptable solution. However, I think blocking them by destination may just be the silver bullet. Yes, a spammer can buy up 200 domains to rotate through his spams, but when the first couple folks report the mail as junk, they’ll be quickly flagged. And that assumes that the actual text of the site is clean enough to not get flagged by the content-based filter.

Paul Graham suggested that the spam of the future would look like “Hey, check this out: [link]“, but it’s interesting to see that they’re actually beyond that; they’re using harmless language in the actual body of the message in order to couteract the inherent spammyness of including a link.

And Over on Stage Left

There may be radically other solutions in the pipe, however, such as the vicious Project Honeypot, which I’m a participant in, having donated a subdomain and installed a honeypot here. Spammers must have nightmares about their harvest bots going awry and scooping up those innocent looking honeypot addresses.

Mike

Discussion

  1. Hey Mike, check out Mozilla Thunderbird - its learning spam filter uses your preferences and isolates spam based on your definition of what garbage is.

    Take care, mate.

    Posted at 9:10 pm on March 28th by Philip.

  2. Oh, I’ve been using Thunderbird for ages (no Outlook on Gentoo!).

    But the point is that the vanilla filter settings have been compromise on two fronts — the ultra simple Spam Of The Future, and also the highly-complex mails that contain sophisticated HTML and CSS to carefully break up the naughty bits and hide blocks of innocent-sounding words.

    Posted at 5:15 pm on March 29th by Mike Purvis.

Leave a Reply

You can use Markdown for style. I love hearing from readers, but please don’t hijack the discussion, use offensive language, or try to sell anything.

© 2004-2008, Mike Purvis, some rights reserved. I'm running Wordpress, and I have an RSS feed.