Lets get rid of CAPTCHA systems. Testing for humanity isn't the answer to blog spam and form signup spoofing.
No more CAPTCHAsBefore I left Texas for Silicon Valley I had an interesting conversation with a
Flash developer friend of mine on creating little mini games as
Turing tests for defeating comment spam on blogs. They would be infinitely more fun than identifying squiggly letters on a grid field.
Maybe a little mini Asteroids game in which you had to reach 10 points before you could continue? Or other simple games that we all love (Tetris, Matching tiles, etc).
An example of a game-based CAPTCHAThe hope is that if you make the effort to post a new comment costly (time, thought, monetarily) than you defeat much of the spam that will make it to your Blog or forum or what have you. We are working on them.
However, the reality is that CAPTCHA systems are already compromised. Have a peep at this example of code by
Casey Chesnut of an AI system that will defeat moderate CAPTCHA systems. If you create a computerized test, it makes sense that a
computerized system can then be created to defeat it.
Even more important than AI systems is a reference to a
Boing Boing article on porn advertisers using free porn as an incentive to defeat CAPTCHA and post spam for them.
Other humans are the real threat.This is where Turing tests start breaking down. If spammers incent humans to take simple tests confirming humanity than no system will ever stop them. If the reward is greater than the investment in time or complexity, how can it?
Spammers get people to signup for Yahoo! accounts, Hotmail e-mail, and most importantly comment on blogs and forums all over the Web with offers for free creams and airline tickets (I have had them even on this small blog).
Here is my proposal on how to prevent comment spam when free porn is on the line.
How about an approval system?
I agree that no one likes approval systems. The Web is real time baby! You want action and reaction. Post the first comment on a new post on a popular blog and feel like your day is complete. Start a flame on a forum, revel in the chaos.
Traditional approval systems just gum up the works. When you add a checkpoint on a highway everything behind it slows down. Same with a Web site; sales, traffic, interest are all tied to real-time. But so is good data. A comment stream full of spam isn't worth posting to. Nor is my trust in the owner of that stream when they don't have time to clean up their posts. What's the chances my comment will be read when it is buried within offers for free medicine?
The owner of a site doesn't have time to post new articles, manage the day-to-day conversations and build new interest if they have to approve comments as they arrive. And certainly they don't want to pay someone just to perform this approval task for them. Or do they?
Introducing Amazon's Mechanical Turk. As Amazon puts it: "
Complete simple tasks that people do better than computers. And, get paid for it". As a system developer, you can create automated systems for humans to interact with your data and provide you feedback on it for a modest fee.
Bingo. The same mode that spammers use to incent humans to defeat your humanity-checker is the same system we can use to prevent Spam. And in real time (or close to it). If we setup a system where comments can be moderated as they are posted and results show up in a timely manner, we achieve the goal of filtering spam and delivering a real time Web experience.
You are thinking, at what price? Having a staffing center open 24 hours a day doesn't seem cost effective.
I tried out Mechanical Turk the night Amazon released it. For about 10 minutes I matched up CD cover images to albums in a database. I think I was paid 1/3 of a cent per successful match I made. I earned about 2 cents in that 10 minutes. Of course I was poor talent as I was more interested in the system than the money. I could have made more money if I would have treated the task as a source of income.
Today, after logging into my dashboard account on Amazon, I see that I have offers of work for 1 cent to 5 dollars per successful work that I complete. The work obviously takes longer to complete the better it pays. Funny enough is that much of it is related to music. I guess they thought I did a good job during my 10 minutes of prior work. But the majority of work is 1 cent or less per item. If you had to add up all the comments I have had on this blog for the last year and pay 1 cent per approval, I think I would approach a few dollars worth of labor.
As a work requestor in Mechanical Turk there are several thresholds you can add to help train a pool of talent for your data. You can have double validation done as you start to recruit talent. This sanitation period helps you gain qualified talent to work on your items. How hard is verifying that a post is genuine and not spam? I'm guessing 1 cent per comment is a fair price to pay. At 15 posts per minute validation (or 900 posts an hour), someone could make 9 dollars an hour at 1 cent per post for spam validation. Beats flipping burgers.
How would we implement a system to utilize Mechanical Turk?First off, we need a company or organization willing to do centralized queuing. The more items you send to Amazon, the better rates you get for the humans doing the work for you. Having a small blog with 10 comments a month pay Amazon directly for the work won't be as cost effective as if Google Blogger had all comments on the Blogger system sent to Amazon.
If a subscriber paid a modest tiered fee to a company that consolidated all of the comments before being sent off to Amazon Mechanical Turk they would realize the best value.The organization performing the queuing would probably realize the most profit this way too.
an example of what a status after posting would look likeAs an added measure, the queuing company could setup a triage system to run some heuristic pattern matching against all incoming posts to see if they have ever rated a similar posts before. This would have worked really well against the
Casey Chesnut CAPTCHA defeating system where his bot posted the same comment to 94 different blogs. The first comment would have been sent out for human validation, but each one after that would have been flagged as spam as the first one was. This duplicate matching of spam would also be instantly returned by the system in real-time, speeding up the systems response to flood spam (a common problem on forums).
And there we have it. Defeating human spam with human validation seems like the best way to defeat the blog spam conundrum.
Anyone want to start this service?
Additional related items to think about.Instead of Amazons Mechanical Turk, we could use a distributed computing system to have people approve comments via e-mail, or through a special web service, etc. By no means does this have to be an economic gain for anyone if you can get free labor through some means.
Well designed CAPTCHA systems need to be designed with
accessibility and compatibility in mind. Lawsuits, liability and ethics might concern your company or organization. Why take the risk if you don't have to.
technorati: CAPTCHA, Spam, Blog, Comments, Amazon, Mechanical Turk
Comments:
Created 86 weeks, 13 hours ago