Pages: 1

What is "Captcha"?

(Click here to view the original thread with full colors/images)


Posted by: golddust

I found a link to this information on another forum, after ignoring the link to the information found on some programs implementing the full featured "captcha" program.
At Web-Life, as soon as programmers are able to do the work, "captcha" will be added to this forum. Some of you may have noticed occasionalspam and "distasteful" posts the MODS and admins constantly are deleting. These posts are often posted by 'bots' (automated programs taught to login to various places to make these spam and rude posts.)
Captcha is a program you will frequently find included with a forum or program's login procedures to prevent the run of the bots.

Here is some pretty cool information about the "captcha" program:
A CAPTCHA is a program that can tell whether its user is a human or a computer. You've probably seen them — colorful images with distorted text at the bottom of Web registration forms. CAPTCHAs are used by many websites to prevent abuse from "bots," or automated programs usually written to generate spam. No computer program can read distorted text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.

Quote:
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then, to make them searchable, transformed into text using "Optical Character Recognition" (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Currently, we are helping to digitize books from the Internet Archive.

Digitizing Books One Word at a Time

http://recaptcha.net/images/sample-ocr.gif
The words above come from scanned books.
By typing them, you help to digitize old texts.




Posted by: forwardone

Excellent article, golddust. We intend implementing this feature as soon as the programmer has time available.



Posted by: Pete Berg

This is an important term which is gernally related to the most inportant language that is asp.net.



Posted by: Spunner

I'm looking forward to it. (Perhaps after mid-December, when computers are my life, I can look into it - if it hasn't already been done).

But, here's an interesting article that points to the fact that this is all an arms-race:

http://www.silicon.com/research/spe...39120541,00.htm
Quote:
Spammers turn to free porn to beat Hotmail security

Latest anti-spam measure scuppered?

By Munir Kotadia

Published: Friday 7 May 2004

By offering free porn, spammers are using internet surfers to bypass a security protection designed to stop bot software from automatically opening web mail accounts.

Free web mail services such as Hotmail and Yahoo! are often used by spammers to send unsolicited emails but because of the sheer quantity of email sent, spammers require thousands of accounts and employ web bots to automatically open them.

To combat this automation, web mail companies started using the Captcha test (Completely Automated Public Test to tell Humans and Computers Apart), which creates a graphically distorted representation of a simple word that can easily be read by a human but not by a machine. The word is often written in an unusual font and presented on a patterned background to further confuse the bots.

To open an email account, the applicant is asked to read the word in the Captcha graphic and type it into an application form. Because the disguised word is virtually impossible for a computer to read, spammers need a human to intervene, which ruins their automation process.

However, as first noted in the Boing Boing blog earlier this year, some spammers have found an ingenious way to bypass the Captcha protection.

First, the spammers open and advertise a website containing pornography. Visitors to the porn site are asked to enter the word contained in a Captcha graphic before they are granted access.

In the background, spammers have already used scripts to automate the web mail accounts opening process to the point where they need a human to "read" the Captcha graphics. The Captcha graphics from the web mail site are transferred to the porn site, where the porn consumers interpret the Captcha words. As soon as they enter the correct word, the script can complete its application process and the visitors are rewarded with free porn.

Simon Perry, vice president of security at Computer Associates International, said security is always a "moving target," and as soon as a company like MSN uses a new technology to secure a product or service, it is only a matter of time before it will be bypassed.

"Each little improvement makes it a little bit more difficult for the spammers. This is an exercise in continually moving up the bar," he said.

According to Perry, the only way to make a real difference is to combine technology with legislation and enforce that legislation. However, he said that even though spammers may have found a way past the Captcha, it is still slowing them down.

"Before the Captcha, those bots could open a million Hotmail accounts a day, but now, if they can attract 10,000 people to their free porn site, they can set up 10,000 accounts, which is a lot but still an order of magnitude less," Perry said.

Neither Microsoft's Hotmail nor Yahoo! would comment on the issue.

Munir Kotadia writes for ZDNet UK




Posted by: golddust

Spunner, this article is from May, 04. Certainly there have been more recent developments. I have to think all the web based email programs are subject to hackers beating their security, etc.



Posted by: Spunner

I didn't notice the date, but that is what spammers are *still* doing to overcome CAPTCHA - didn't know it had been happening that long. Nice catch.



Posted by: Spunner

Quote:
November 20, 2007

Has CAPTCHA Been "Broken"?

A recent Wall Street Journal describes Ticketmaster's problems with online scalpers:
The Internet era has brought speed and convenience to all sorts of consumer transactions. For concertgoers, however, it has also led to ever-faster sellouts for hot events. Ticketmaster deploys technology that is supposed to stop brokers from gaining access to large numbers of seats via online sales. But it says brokers' software circumvents the company's protections. That has placed large numbers of seats in the hands of brokers who use eBay Inc.'s StubHub, Craigslist and other online venues to resell the tickets at a big mark up.

One situation roiling consumers involves the 54-concert "Best of Both Worlds" tour in which singer-actress Miley Cyrus is performing sets as herself and as her fictional alter ego, Hannah Montana. Parents and children have found finding tickets for the shows difficult and expensive. The issue is drawing the attention of government officials. On Thursday -- in a rare Internet-age example of authorities enforcing antiscalping laws -- the attorneys general of Missouri and Arkansas filed lawsuits against people accused of illegally reselling Hannah Montana tickets.

According to StubHub, tickets for "Best of Both Worlds" are currently selling for an average $237, making them pricier than seats for the Police ($209), Justin Timberlake ($182) and Beyoncé ($212). The highest face value for a ticket on the Hannah Montana tour: $63.

They must have really pissed off some high ranking political parents to get that kind of attention. Not that they don't deserve it-- scalpers are evil, profiteering *******s, to be sure. They deserve all the pain we can send their way.

The "technology that is supposed to stop brokers" they're referring to is CAPTCHA.
For instance, companies like Ticketmaster require customers searching for tickets online to replicate a set of the squiggly letters and numbers, known as a "Captcha." Theoretically, only human customers can correctly identify the characters despite the odd fonts, screening out automated purchasing programs. But RMG's software, according to Mr. Kovach, can also "figure out the randomly generated characters and retype them automatically." Mr. Kovach said RMG employees also gave him advice on fooling Ticketmaster's computers into thinking his requests were coming from different Internet addresses. Neither Mr. Kovach nor his lawyer could be reached for comment.
So if online scalpers are somehow beating the system, does that mean CAPTCHA has been broken? I covered this topic a year ago, and my opinion has not changed. If CAPTCHAs were well and truly broken, Google, Yahoo, and Hotmail would stop using them. Why would they continue to use something that doesn't work? I'm not going to rehash all the arguments here, but if you have strong feelings on this topic, I urge you to read my earlier post before commenting.

Ticketmaster's problem is that their CAPTCHA is not good enough. Programmers don't seem to understand what makes a CAPTCHA difficult to "break". But it's not difficult to find out. Heck, the hackers themselves will tell you how to do CAPTCHA correctly if you just know where to look. For example, this Chinese hacker's page breaks down a number of common CAPTCHAs, and the price of software he sells to defeat them at a certain percentage success rate:

[See the actual post or the hacker's page to see the examples]

It seems an awful lot of programmers subscribe to the "add some crazy patterns and/or colors to the text and pray for the best" school of CAPTCHA design. That's not only sloppy, it just doesn't work. The top of this chart is littered with their failed attempts. On some sites, this is OK. They don't need the same world-class level of protection from bots and scripts that Ticketmaster does-- there's tremendous financial incentive for scalpers to break their system.

This particular hacker estimates a 50% success rate against the Ticketmaster captcha, long before the above article was published. No wonder those parents weren't able to buy their kids Hannah Montana tickets-- it's not because of failings in CAPTCHA protection, it's because the ticketmaster programmers failed to implement CAPTCHA correctly.

Instead of hacking together their own partially effective (and often not even human solvable) CAPTCHA, what Ticketmaster's programmers should have done is studied prior art-- in particular, by outright copying the high-volume, extensively researched Yahoo, Google, and Hotmail CAPTCHAs. I'm awfully fond of Google's CAPTCHA technique; in my professional opinion, it is simultaneously the most readable and the most hellishly difficult to OCR correctly. If you need industrial strength protection from bots and scripts, that's where you want to start.

Posted by Jeff Atwood




Posted by: forwardone

Interesting article indeed. Google`s version of `captcha` isn`t universally welcomed though it would seem..

http://www.spy.org.uk/spyblog/2005/...pyware_cap.html




eXTReMe Tracker