I apologize for doing this, but recent onslaught of spammers (~35 in the last 5 days) have left me with no choice: I've changed comments from non-friends to be screened by default AND require a captcha.

In the early hours of this morning, a spammer managed to get the IP of the Gentoo list server on the NiX Spam RBL... simply by spamming the subscribe address :-(. This caused approximately 2000 deliveries of normal list mail to be rejected while the server was present on the RBL.

Log details

Why did this happen? I do agree on the importance of spamtrap accounts, but they MUST check the content of their messages. A list confirmation message MUST NOT be considered as spam.

The original subscribe request came from what seems to be a compromised server in Secunderabad, India. So it wouldn't have been detected by RBL focused on modem/dialup addresses.

Short of raising the bar to subscribe (with a specific token that needs to be included, and then it's only a matter of time till spammers include it too), there isn't much we can do to block stuff like this at the list-server level. There is no way to detect than an address is a spamtrap. There cannot be by definition, as the spammers would avoid it themselves otherwise.

Since I posted 15 hours ago, I've gotten another 6 spam comments from the same algorithm.

An interesting further trend I noticed on them: Given the same comment input body, they post the identical text, further lending credence to the concept that it's an automatic system.
Seems like I'm out of luck? This will greatly hamper anything which I want to do with the data.
If I wanted to store my data on the user's drive or on a server, though, I'm pretty-much out of luck.
The spammer's source for the sentence was:
I'm not sure what's with the sudden wave of spammers hitting livejournal. However they are using some smarted behavior now.

They came to my attension as their spam engines are taking some comments or posts, using them as input data for google, and then using the google results for posting their comment. it's statistical noise, but it's exactly similar to the input noise, so it's beating Bayesian analysis as well.

From a discussion about LDAP lookup failures:
There is actually a user and group for trousers. Also gentoo tends to like certain id/groups as certain UID/GID. If that group is used it will then user the next available GID/UID. Sometimes through later upgrades this becomes an issue. Usually badly written ebuilds. That is why I let the script make it.
Of course, not all, but many ebuilds are plainly badly written or gives bad implementations. One example, I update my OpenLDAP setup.

In a wave, over the last 3 hours:
They have a single post, with a large block of filler text from somewhere totally random. Followed by a bunch of links with pornish titles, and matching the following pattern:
RANDOM = appears to be foreign dictionary words
CC = belgique, espana, france, quebec, suissse
TLD = .com, except for espana, which is under .es
WORDS = title words for porn stuff, no spaces

In regards to my previous post, I should clarify that I was looking at patterns in spam, and wondering about taking advantage of the spammers.

Here's another one that turned up in my spam folder since I went to bed. Obviously this stock is well past it's prime, but it rocketed 480% from $0.012 to $0.058 between the opening of business Thursday, and the opening of business Friday: MHII.OB.

Corey Shields pointed out the markets are closed today, as it's Martin Luther King day, but I will stick to my prediction about VTSS anyway.

I'm wondering, from an IT point of view, if we noticed enough of the spam, esp. early during the spam runs, could we profit from the actions of spammers?
What do all of these have in common?

They are all stock symbols, that were promoted via pump-and-dump scams in the last 4 days.

Let's look at one the that I saw last week for a moment:
Starting Monday Jan 8th, the RRLB went up nearly 120% overnight (from $0.04 to $0.10). Then around midday Tuesday there was a big sell, followed by another big sell wednesday.

Now if we look at the others (these are all the ones that really stand out, the others don't have enough data available for me to make a conclusion).
MENV - 40% growth
NWOG - 20% growth
PRTH - 80% growth
SFWJ - 50% growth
UTVG - 80% growth in less than 12 hours.
VTSS - 10% growth since yesterday, but this spam run only appears to be starting.

I want to make a prediction here, to test a theory. VTSS will gain at least another 20% before Wednesday.
I have returned from my brief honeymoon. I'll write about it in more detail soon, but for now, a point outline only:

  • Fri Dec 1st - Travel and the Taxi driver who didn't know where the hotel was.
  • Sat Dec 2nd - In which we acquire bicycles.
  • Sun Dec 3rd - Waterfalls, lava tubes, and a close encounter of the paved kind.
  • Mon Dec 4th - Recuperation - ouch! everything hurts
  • Tue Dec 5th - Replacement bicycle, and exploring Hilo.
  • Wed Dec 6th - The finding and snorkelling of a mismanaged reef.
  • Thu Dec 7th - Boarding pass SSSS's & social engineering.

Email statistics
Here are statistics on new email that I recieved while I was away. This excludes all mailing-list email, which is not subject to spam filtering as the lists are extremely clear of spam, and my procmail rules shuffle the email into seperate folders quite fine. That would add another ~3000 non-spam emails into the count, but are not really relevant to spam categorization success rates.
I have my spam settings reasonably conservative, as I don't mind deleting spam that makes it through the filters, but false positives are a much larger concern.
total new messages: 2191
total spam: 1771
false positives: 1 (0.045% of total)
false negatives: 446 (20.3% of total, 25.2% of spam)
The false negatives are getting very interesting now. Random chunks of online documents, incl sentances from the document used as subjects, with an attached image as the actual spam, or cleverly merged HTML+CSS that would render the spam text over the other text. Two of them appeared to be chunks of the MySQL documentation.
The gentoo mail aliases like mysql-bugs@g.o appear to be very badly hit with spam, accounting for nearly 70% of the false negatives - this is also possibly because I have to trust the relaying of the Gentoo email servers, and cannot check the machine that the email came from.

