robbat2: (Default)
[personal profile] robbat2
Spammers are getting less and less original, I only check my Spam box once every few weeks on average, so there are some interesting patterns that emerge.
This is a set of marginal spam, sorted by the subject line. The spam rule for 'Pharm' triggered only on occurances that had that entire string. The others scored lower, and were only caught by the body.

Now look at how their obfusciation algorithm works.
   1046 N * Mar 11 Andrew Esparza  (  22) Internet P/harmacy
   1047 N * Mar 05 Young E. Hahn   (  22) Internet P\harmacy
   1048 N * Mar 02 Lisa Kirkpatric (  22) Internet Ph!armacy
   1049 N * Mar 03 Rodger Keller   (  22) Internet Ph%armacy
   1050 N * Mar 01 Jami Hatfield   (  22) Internet Ph@armacy
   1051 N * Mar 08 Cristina Eubank (  22) Internet Ph]armacy
   1052 N * Mar 10 Arnulfo Cruz    (  22) Internet Pha(rmacy
   1053 N * Mar 11 Lula R. Chang   (  22) Internet Pha*rmacy
   1054 N * Mar 09 Clark Swan      (  22) Internet Pha/rmacy
   1055 N * Mar 07 Glen Woodruff   (  22) Internet Pha\rmacy
   1056 N * Mar 02 Hillary Abraham (  22) Internet Phar/macy
   1057 N * Mar 05 Freddy K. Stubb (  22) Internet Phar\macy
   1058 N * Mar 02 Charley Meeks   (  22) Internet Pharm(acy
   1059 N * Mar 06 Annmarie Kruege (  22) Internet Pharm*acy
   1060 N * Mar 10 Hollis Brown    (  22) Internet Pharm^acy
   1061 N * Mar 02 Sandra Barnard  (  22) Internet Pharma!cy
   1062 N * Mar 07 Johnathan W. Ba (  22) Internet Pharma*cy
   1063 N * Feb 28 Fay Kay         (  22) Internet Pharma/cy
   1064 N * Mar 01 Beverley Gibbs  (  22) Internet Pharma/cy
   1065 N * Mar 05 Deandre Carson  (  22) Internet Pharma@cy
   1066 N * Mar 08 Dexter Tatum    (  22) Internet Pharma@cy
   1067 N * Mar 03 Georgette Suthe (  22) Internet Pharma]cy
   1068 N * Mar 04 Bertha Oneil    (  22) Internet Pharmac(y
   1069 N * Feb 28 Don Neely       (  22) Internet Pharmac*y
   1070 N * Mar 07 Art Sewell      (  22) Internet Pharmac/y
   1071 N * Mar 06 Brian Ellison   (  22) Internet Pharmac[y


I think if somebody could come up with an efficent algorithm to detect permutations of a string with N characters wrong, spam detection could improve a reasonable amount.

(no subject)

Date: 2005-03-12 04:54 am (UTC)
ext_85396: (Default)
From: [identity profile] unixronin.livejournal.com
Try googling for 'agrep'. It's an approximate-pattern-matching algorithm. You can tell it how many errors of what kinds you will accept and still consider it a match.

May 2017

S M T W T F S
 123456
78910111213
141516171819 20
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags