Chapter

Pattern Recognition

Volume 7329 of the series Lecture Notes in Computer Science pp 155-165

Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition

  • Claudia Cruz-PerezAffiliated withCENTIA, Department of Computing, Electronics, and Mechatronics, Universidad de las Américas Puebla
  • , Oleg StarostenkoAffiliated withCENTIA, Department of Computing, Electronics, and Mechatronics, Universidad de las Américas Puebla
  • , Fernando Uceda-PongaAffiliated withCENTIA, Department of Computing, Electronics, and Mechatronics, Universidad de las Américas Puebla
  • , Vicente Alarcon-AquinoAffiliated withCENTIA, Department of Computing, Electronics, and Mechatronics, Universidad de las Américas Puebla
  • , Leobardo Reyes-CabreraAffiliated withCENTIA, Department of Computing, Electronics, and Mechatronics, Universidad de las Américas Puebla

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In this paper we present a novel approach for automatic segmentation and recognition of reCAPTCHA in Web sites. It is based on CAPTCHA image preprocessing with character alignment, morphological segmentation with three-color bar character encoding and heuristic recognition. The original proposal consists in exploiting three-color bar code for characters in CAPTCHA for their robust segmentation with presence of random collapse overlapping letters and distortions by particular patterns of waving rotation. Additionally, a novel implementation of SVM-based learning classifier for recognition of combinations of characters in training corpus has been proposed that permits to increment more than twice the recognition success rate without time extension of system response. The main goal of this research is to reduce vulnerability of CAPTCHA from spam and frauds as well as to provide a novel approach for recognizing either handwritten or degraded and damaged texts in ancient manuscripts. Our designed framework implementing the proposed approach has been tested in real-time applications with sites used CAPTCHAS achieving segmentation success rate about of 82% and recognition success rate about of 94%.

Keywords

reCAPTCHA breaking segmentation attack unpredictable collapse three-color bar character encoding heuristic classifier