Sunday, April 27, 2008

stop spam. read books.

I just discovered something so cool I had to put off homework a little while longer and blog about it. I was reading the excellent blog learning.now ("At the crossroads of Internet culture & education with host Andy Carvin"), and I noticed at the bottom of the entry in the space for comments, it required the commenter to type two obscured words, to make sure only real people could comment.

The first thing that struck me was that it asked for two words, instead of the standard one. The second thing I noticed was that there is an audio alternative for people who are visually impaired. This is fantastic. Then I noticed "stop spam. read books," and I was very intrigued. I went to the reCAPTCHA website and learned more.

It turns out they work with book digitization projects like the Internet Archive, and when you type in the distorted word, you are reading a word that a computer could not recognize with OCR (I guess the computer knows when it's not reading a word right). They put one word they already know and one they don't, so if you get one right they know you probably got the other one right. They give the same word image to multiple people to get a higher accuracy rate.

So, for example, in the image above, they might already know that the first word is "Germantown," but their OCR software couldn't read the second word, "were." When I type in "Germantown were," I am helping the Internet Archive turn scanned images of a book page into digital text. Awesome. It looks like you can add it to your WordPress blog, among others, but I didn't see a way to attach it to Blogger.

2 comments:

Eric said...

thanks for the info! was wondering what that meant!

Anonymous said...

Holy shit man, thanks lil bro