Breaking CAPTCHA or why you should be using reCAPTCHA V3

Implementing Captcha

Initial versions of captcha transforms texts and adds in various transformations (e.g lines, character rotations, space reduction between words). The purpose is to confound robots but at the same time providing enough data for humans to decipher the text.

Squiggly characters and lines
Adding a line to confound bots
Reducing space between characters
Sample Captcha images generated by SimpleCaptcha

Breaking SimpleCaptcha

For a start, we will attempt to solve a simple captcha implementation using the following code segment

  1. First we setup a new console project using .NET Core
tessdata subfolder configuration and loadout
Program.cs for TesseractCaptcha

Tesseract OCR in action

Running it gives the following output

OCR results using eng_best trained data, only 39%
Higher confidence at 49%

Concluding thoughts

The code sample provided is intentionally bare as the intent is not to demonstrate working samples for breaking captchas, but rather show that machine learning and trained models are sufficiently good enough to crack most traditional captcha systems without breaking a sweat.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeffery Tay

Jeffery Tay

6 Followers

Education is in my blood, partnership and coaching is my passion. ¬ L’essentiel est invisible pour les yeux