hoodwink.d enhanced


Google's Sly Battle on Comment Spam and Our Respective Policy #

by why in -h

It’s great to hear that Google has started a simple solidarity against web spam. The gist of it is:

 <a rel="nofollow" href="http://spamjockey.com/">Spam Jockey!!</a>

You add the rel="nofollow" to external links in your Wiki or blog. Google (and other cooperating search engines) ignore those links. In time, spammers will get the picture and leave Wikis and blogs alone. It’s a noble effort and I think it’s worthwhile.

For now, I’m not going to touch this, though. I want comments here to be weighed in Google’s search results. Our topic is just so specific and our readership rather small in contrast to so many sites.

When spam becomes a problem, though, external links here will be tagged with rel="nofollow". I’d like to couple it with a technique for approving links.

said on 19 Jan 2005 at 23:38

For what it’s worth, I had an awful problem with comment spam when I was using MovableType. However, since moving to Rublog and using my own custom comment engine—with the poor-man’s captcha built into it—I get maybe one or two spam comments per month.

Don’t know if a captcha is more trouble than you’d like to bother with, though. Plus, I’m sure you have a much larger readership than I do, so you may be targetted more frequently by spammers.

said on 20 Jan 2005 at 05:03

.. I’d like to couple it with a technique for approving links ..

Is this where the Bayesian Classifier you mentionned before could help ?

said on 20 Jan 2005 at 14:37

There was an interesting outlook on nofollow in a kuro5hin article recently.

said on 20 Jan 2005 at 15:42

Well, it’s like the guy on kuro5hin said, the nofollow stuff only kills the spammers who want PageRank.

That’s why I’m saying, I’d probably have a process where:

  1. User places a comment, the links are tagged with nofollow.
  2. Should the comment fail a bayesian filter, the comment is not allowed.
  3. RedHanded maintainers get a periodic e-mail (or watch an RSS feed) with a list of unapproved comments.
  4. Comment spam is scanned and added to the filter’s word list, then deleted.
  5. Approved comments appear with links free of nofollow.

The approval process would be very simple. A page with the unapproved comments listed. Check a box if it’s spam. Submit the page and the comments are all processed based on the checkbox states.

said on 21 Jan 2005 at 02:09

Should the comment fail a bayesian filter,

You gave me the idea to work (for my pleasure) on a bayesian filter .. I’m not sure it will be as powerfull as already existing versions (in other languages though), but it’s very interesting to work on this kind of problem ;) Thnaks for this _why :)

said on 21 Jan 2005 at 11:07

How about bsfilter? This is targeted at mails, though.

said on 21 Jan 2005 at 11:44

I wasn’t aware of bsfilter … I will have a look at it … In fact as it’s just a kind of ‘toy project’ for me, it can be interesting to see other real project implementations !

Comments are closed for this entry.