Slideshow transcript
Slide 1: CAPTCHA Are you Human? (Sorry, I had to ask) Ecaterina Valică http://students.info.uaic.ro/~evalica/ 1
Slide 2: CAPTCHA Agenda What is CAPTCHA? Types of CAPTCHA Where to use CAPTCHAs? Guidelines when making a CAPTCHA Ways to break CAPTCHAs reCAPTCHA Human Computation Games 2
Slide 3: CAPTCHA Example: Filling out a form Google uses CAPTCHA for Gmail accounts: 3
Slide 4: CAPTCHA Beginnings Completely Automated Public Turing test to tell Computers and Humans Apart Created in 2000 for Yahoo to prevent automated e-mail account registration, by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford, Carnegie Mellon University. 4
Slide 5: CAPTCHA Inventor Luis von Ahn (1978 - ) Photograph by Mike McGregor 5
Slide 6: CAPTCHA What is CAPTCHA? A program that can tell whether its user is a human or a computer. It uses a type of challenge-response test to determine that the response is not generated by a computer. 6
Slide 7: CAPTCHA Turing Test „Standard Interpretation" player C, the interrogator, is tasked with trying to determine which player - A or B - is a computer and which is a human. 7
Slide 8: CAPTCHA Reverse Turing Test A CAPTCHA is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human. 8
Slide 9: CAPTCHA So, CAPTCHA is… A program that can generate and grade tests that: Most humans can pass; Current computer programs cannot pass. 9
Slide 10: CAPTCHA Making a CAPTCHA Pick random Renders it into a string of characters distorted image (or words) ifhkfp 10
Slide 11: CAPTCHA Making a CAPTCHA … and the program generates a test: Type the characters that appear in the image 11
Slide 12: CAPTCHA Outperform the computers In many simple tasks, a typical 5-year-old can outperform the most powerful computers easier for computers: • like medical diagnosis, • playing chess, hard for computers: • operations requiring vision, hearing, language or motor control. 12
Slide 13: CAPTCHA Type: Early CAPTCHAs Generated by the EZ-Gimpy program; Used previously on Yahoo! 13
Slide 14: CAPTCHA Type: Improved CAPTCHA high contrast for human readability; medium, per-character perturbation; random fonts per character; low background noise; 14
Slide 15: CAPTCHA Type: A modern CAPTCHA rather than attempting to create a distorted background and high levels of warping on the text; focus on making segmentation difficult by adding an angled line; 15
Slide 16: CAPTCHA Type: A modern CAPTCHA another way to make segmentation difficult is to crowd symbols together; this can be read by humans but cannot be segmented by bots; 16
Slide 17: CAPTCHA Other Types of CAPTCHA Animated CAPTCHAs 3D CAPTCHA ASCII art Reverse CAPTCHA "Leave this field blank" 17
Slide 18: CAPTCHA Other: Cognitive Puzzles Distinguish pictures of dogs from cats Choose a word that relates to all the images Trivia questions Math and word problems 3D Object CAPTCHA Solve failed OCR inputs 18
Slide 19: CAPTCHA Other: Distinguish pictures Microsoft Asirra (Animal Species Image Recognition for Restricting Access); KittenAuth Project . 19
Slide 20: CAPTCHA Other: Mathematical CAPTCHA 20
Slide 21: CAPTCHA Other: Mathematical CAPTCHA 21
Slide 22: CAPTCHA Other: 3D Object CAPTCHA You must enter them in the exact sequence listed: • The Head of the Walking Man, • The Vase, • The Back of the Chair. 22
Slide 23: CAPTCHA Other: Jumble Game 23
Slide 24: CAPTCHA Other: Drupal Examples 24
Slide 25: CAPTCHA Other: Tests „Common Sense" questions: • „What is 3 + 5?“ • „What color is the sky?" Type the word 'orange'; Require a valid email to approve; These attempts violate principles: • they cannot be automatically generated; • they can be easily cracked given the state of AI. 25
Slide 26: CAPTCHA Where to use CAPTCHAs? Data Collection Worms and Spam Preventing Comment Spam in Blogs Protecting Email Addresses From Scrapers Online Polls Protecting Website Registration Preventing Dictionary Attacks Search Engine Bots 26
Slide 27: CAPTCHA Where to use CAPTCHAs? Preventing Comment Spam in Blogs. Protecting Email Addresses From Scrapers. Mechanism to hide your email address, require users to solve a CAPTCHA before showing your email address Online Polls. You cannot trust the results of an online roll because anybody could just write a program to vote for their favorite option thousands of times. 27
Slide 28: CAPTCHA Where to use CAPTCHAs? Protecting Website Registration. (E-mail services: Yahoo, Microsoft, Google) Preventing Dictionary Attacks (in password systems). Prevent a computer to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins. Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily. 28
Slide 29: CAPTCHA Guidelines Image Security. Images of text should be distorted randomly before being presented to the user. Script Level Security. Insecurities: • Systems that pass the answer in plain text; • Systems where a solution to the same CAPTCHA can be used multiple times ("replay attacks"). 29
Slide 30: CAPTCHA Guidelines Security Even After Wide-Spread Adoption. There are CAPTCHAs that would be insecure if a significant number of sites started using them. • Example: text-based questions; • A parser could easily be written that would allow bots to bypass the test; • Such “CAPTCHAs” rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge. 30
Slide 31: CAPTCHA Guidelines Accessibility. • CAPTCHAs prevent visually impaired users (for example, due to a disability or because it is difficult to read) from accessing the protected resource; • They use screen reader, so when you reached an image, all it can do is to read the caption of that image; • Solution: permitting users to opt for an audio or sound CAPTCHA. 31
Slide 32: CAPTCHA Guidelines: Accesibility Hard to read CAPTCHAs: 32
Slide 33: CAPTCHA Guidelines: Accesibility Worst CAPTCHAs: 33
Slide 34: CAPTCHA Ways to break CAPTCHAs Exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA; Improving Character Recognition software (OCR – Optical Character Recognition ); Using cheap human labor to process the tests (sweatshops). 34
Slide 35: CAPTCHA Break: Insecure implementation Re-using the session ID of a known CAPTCHA image. Other CAPTCHA use a hash of the solution as a key passed to the client to validate. Often it is small enough size that it can be cracked. Other implementations use only a small fixed pool of CAPTCHA images (Asirra – 3 millions). 35
Slide 36: CAPTCHA Break: Character Recognition Programs that have the following functions: • Extraction of the image from the web page • Removal of background clutter, for example with color filters and detection of thin lines; • Segmentation, i.e. splitting the image into regions each containing a single letter; • Identifying the letter for each region. 36
Slide 37: CAPTCHA Attacks – EZ-Gimpy 2000 Yahoo's early CAPTCHA called "EZ-Gimpy“; The program picks a word from a dictionary, and produces a distorted and noisy image of the word; Algorithm for breaking EZ-Gimpy (92%): 1. Locate possible letters at various locations; 2. Construct graph of consistent letters; 3. Look for plausible words in the graph. 37
Slide 38: CAPTCHA Attacks – EZ-Gimpy 2000 EZ-Gimpy Graph of Letters Possible Letters Plausible Words 38
Slide 39: CAPTCHA Attacks – SimpleOCR Engine 2002 39
Slide 40: CAPTCHA Attacks – Jan/Feb 2008 Google (Jan 17) 20% Hotmail (Feb 6) 30-35% Yahoo (Feb 22) 30-35% 40
Slide 41: CAPTCHA Attacks – Projects Several broking CAPTCHAs projects: • http://libcaca.zoy.org/wiki/PWNtcha • http://www.lafdc.com/captcha/ 41
Slide 42: CAPTCHA Break: Human solvers Attacks that uses humans to solve the puzzles; Approaches: • relaying the puzzles to a group of human operators who can solve CAPTCHAs; • copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. 42
Slide 43: CAPTCHA CAPTCHA Sweatshops A computer fills out a form and when it reaches a CAPTCHA, it gives it to the operator to solve. Weakness for Asirra: • if the database of cat and dog photos can be downloaded, • then paying workers $0.01 to classify each photo, • means that almost the entire database of photos can be deciphered for $30,000. Once IP has misclassified a challenge, a human needs to just solve two Asirras in a row from the same browser session. 43
Slide 44: CAPTCHA CAPTCHA Sweatshops Not Economical Viable $2.50 / h for each human 720 CAPTCHAs per hour per human 1/3 cent per account A typical spam run of 1 million messages per day would cost $14,000 per day and require 116 people working 24/7. 44
Slide 45: CAPTCHA Porn Companies (October 2007) They write a program that fills out the entire registration form (ex Yahoo); When the program gets to the CAPTCHA it can’t solve it; So it copies the CAPTCHA back to the porn page; One person gets the screen saying if you want to see the next picture, you’ve got to tell what word is in the specific CAPTCHA. 45
Slide 46: CAPTCHA Porn Companies (October 2007) 46
Slide 47: CAPTCHA Next CAPTCHA Generation CAPTCHAs can be made stronger, but they are already too advanced for a large percentage of Internet users; CAPTCHA devolves from a simple human reading test into an intelligence test or an acuity test. 47
Slide 48: CAPTCHA reCAPTCHA (2007) New form of CAPTCHA that also helps digitize books; The words displayed to the user come directly from old books that are being digitized; Words that OCR could not identify; 48
Slide 49: CAPTCHA reCAPTCHA Pairs an unknown word with a known one; Distorts them both and puts a line through them and then sent them to be proofread; Respondent answers both elements: • half of effort validates the challenge; • the other half is captured as work. 49
Slide 50: CAPTCHA reCAPTCHA 50
Slide 51: CAPTCHA Time spent Roughly 60 million CAPTCHAs are solved each day; Medium 10 seconds to solve a captcha; People around the world waste more than 150,000 hours on solving CAPTCHAs; 51
Slide 52: CAPTCHA Time spent A fifth of those users giving 30,000 daily man-hours of work; It would constitute the world's fastest and most accurate character-recognition computer, processing 10 million words a day. Recreating the books – word by word 52
Slide 53: CAPTCHA Time spent 9 Billion Human-Hours of Solitaire were played in 2003 Empire State Building 7 million Human-Hours (6.8 Hours of Solitaire) Panama Canal 20 Million Human-Hours (Less than a day of Solitaire) 53
Slide 54: CAPTCHA Wasted human cycles If the world's computer Solitaire players could be coaxed into enjoying a game that contributed to solving a computing problem, he calculates, it would produce billions of man-hours of labor each year. „make all of humanity more efficient by exploiting the human cycles that get wasted“ 54
Slide 55: CAPTCHA Wasted human cycles People will contribute their brainpower, but only if they're given an enjoyable, time-killing experience in exchange. Most projects that harness human processing power rely on a different motivator: money. Which produces better results — a small group of experts or a huge mob of amateurs? 55
Slide 56: CAPTCHA Human Computation Things that we humans can do and computer cannot, like: • Labeling images with words; • Picking out a voice in a loudly room; Humans have trouble remembering long, random strings of characters, yet they excel at remembering faces and objects. 56
Slide 57: CAPTCHA Symbiotic relationship One in which humans solve some problems, computer solve some other problems; Image search - A method that every image on the Web could give us accurate textual descriptions of those images; 57
Slide 58: CAPTCHA The ESP Game Two-player online game; Partners don’t know each other and can’t communicate; Object of the game: Type the same word; The only thing in common is an image; 58
Slide 59: CAPTCHA The ESP Game Player 1 Player 2 Guessing: CAR Guessing : BOY Guessing : HAT Guessing : CAR Guessing: KID Success! Success! You agree on CAR You agree on CAR 59
Slide 60: CAPTCHA The ESP Game The ESP Game has been licensed (2006) by Google in the form of the Google Image Labeler, and is used to improve the accuracy of the Google Image Search. “5000 people playing simultaneously can label all images on Google in 30 days!” 60
Slide 61: CAPTCHA http://gwap.com/gwap/ 61
Slide 62: CAPTCHA http://gwap.com/gwap/ ESP Tag a Tune Matchin 62
Slide 63: CAPTCHA http://gwap.com/gwap/ Squigl Verbosity 63
Slide 64: CAPTCHA Future Games Language translation. A game could challenge two players who don’t speak the same language to translate text from one language to the other. Monitoring of security cameras. Players could monitor security cameras and alert authorities about suspected illegal activity. 64
Slide 65: CAPTCHA Future Games Improving Web search. People have varying degrees of skill at searching for information on the Web. A game could be designed in which the players perform searches for other people. Text summarization. Imagine a game in which people summarize important documents for the rest of the world. 65
Slide 66: CAPTCHA Still not thinking big enough "If we have that many people all doing some little part, we could do something insanely huge for humanity." "We'll never run out of things to digitize" 66
Slide 67: CAPTCHA 67
Slide 68: CAPTCHA Bibliography Site: Luis von Ahn Website (2006) Site: reCAPTCHA (2007) Site: CAPTCHA (2007) Site: Gwap (2008) Interview: „Using “captchas” to digitize books “ (2007) Interview: „For Certain Tasks, the Cortex Still Beats the CPU“ (2007) 68
Slide 69: CAPTCHA Bibliography Video: Wired – „Human Computation“ (2007) Video: Google TechTalks – “Human Computation” (2006) Paper: „Games With a Purpose“ (2006) Paper: „How Lazy Cryptographers do AI“ (2004) Paper: „CAPTCHA: Using Hard AI Problems for Security“(2003) 69
Slide 70: CAPTCHA Bibliography Article: “CAPTCHA is Dead, Long Live CAPTCHA!” (2008) Article: „ Yahoo's CAPTCHA Security Reportedly Broken “ (2008) Article: „Anti-CAPTCHA operations on Microsoft Mail“ (2008) Article: „ Google’s CAPTCHA busted in recent spammer tac 70 “ (2008)
Slide 71: CAPTCHA Bibliography Paper: „Recognizing Objects in Adversarial Clutter“ (2002) Article: Wikipedia CAPTCHA (2008) Article: „CAPTCHA Effectiveness” (2006) Article: „Breaking a Visual CAPTCHA“ (2002) Article: „Human or Computer? Take This Test“ (2002) Site: XKCD (2008) 71
Slide 72: CAPTCHA Thank you! 72



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 3 (more)