Skip to main content

Advertisement

Log in

Arabic reCAPTCHA Service for Enhancing Digitization of Arabic Manuscripts

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

reCAPTCHA is a security measure that guards web applications against automated bot abuse by presenting a random auto-generated challenge for users to solve. These challenges have to be devised to be hard on computers, yet easily solved by humans. In this paper, we present a cloud-based Arabic reCAPTCHA service that provides protection for Arabic websites against automated abuse. In addition, the proposed service is designed to improve the accuracy of printed Arabic manuscripts digitization when compared with the traditional digitization using optical character recognition software. The architectural design, algorithms, implementation and deployment guidelines presented in this paper are not limited to the Arabic language, but can be the basis for developing a reCAPTCHA service for any other language. The paper discusses the need for developing an Arabic reCAPTCHA service and then presents an original system architecture, design and implementation. We also address and propose solutions and algorithms to a number of design and implementation challenges. First, we devise a scheme to properly extract word images from scanned pages to form reCAPTCHA challenges. Second, we propose a classification mechanism for the extracted word images into known and unknown word sets. Third, we explore and propose two algorithms for processing user input to a reCAPTCHA challenge to prepare the service response for user verification, and at the same time, store the user guess for the digitization process. Fourth, we present a solution to maintain data integrity while handling multiple user requests for reCAPTCHA challenges. Moreover, we show how the different components and subservices of our proposed Arabic reCAPTCHA system can be deployed on a public cloud as that of Amazon Web Services. Finally, we conduct an experimental study to validate the efficacy of the service. The study shows that an overall digitization accuracy of 97.67 and 96.73% in two experiment setups was attained and that 72.2% of the audience preferred solving Arabic reCAPTCHA challenges over English reCAPTCHA in Arabic websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. von Ahn, L.; Blum, M.; Hopper, N.; Langford, J.: Captcha: using hard ai for security. In: Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 294–311 (2003)

  2. Shirali-Shahreza, M.: Highlighting CAPTCHA. In: The IEEE Conference on Human System Interactions, Krakow (2008)

  3. Von Ahn, L.; Blum, M.; Langford, J.: Telling humans and computers apart automatically. Commun. ACM 47, 56–60 (2004)

    Article  Google Scholar 

  4. Lee, Y.L.; Hsu, C.H.: Usability study of text-based CAPTCHAs. Displays 32(2), 81–86 (2011). (Elsiever Science)

    Article  Google Scholar 

  5. Phadnis, P.; Patil, T.; Maurya, P.; Bhole, C.: Face-matching based CAPTCHA: generation and analysis. Int. J. Res. Advent Technol. 2(1), 287–290 (2014)

    Google Scholar 

  6. Nayeem, M.T.; Mohammad, S.H.M.; Ahmed, S.; Mohammad, M.R.: Use of human cognition in HIP design via EmotIcons to defend BOT attacks. In: Proceedings of the IEEE 15th International Conference on Computational Science and Engineering (CSE), Nicosia (2012)

  7. Basso, A.; Bergadano, F.: Anti-bot strategies based on human interactive proofs. In: Handbook of Information and Communication Security. Springer, London, pp. 273–292 (2010)

  8. von Ahn, L.; Maurer, B.; McMillen, C.; Abraham, D.; Blum, M.: reCAPTCHA: human based character recognition via web security measures. Sci. Mag. 321, 1465–1468 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Abubaker, H.; Salah, K.; Al-Muhairi, H.; Bentiba, A.: Cloud-based Arabic reCAPTCHA Service: design and architecture. In: 12th International Conference on Computer Systems and Applications, Marrakech (2015)

  10. Teaching computers to read: Google acquires reCAPTCHA, Google, 16 September 2009. [Online]. https://googleblog.blogspot.ae/2009/09/teaching-computers-to-read-google.html. Accessed 9 Dec 2015

  11. Gümüş, I.; Abul, O.: Turkish archive digitization by human computation approach. In: IEEE Symposium on Innovations in Intelligent Systems and Applications (INISTA 2012), Trabzon (2012)

  12. Ishihara, T.; Itoko, T.; Sato, D.; Tzadok, A.; Takag, H.: Transforming Japanese archives into accessible digital books. In: The 12th ACM/IEEE-CS Joint Conference on Digital Libraries, Washington, DC (2012)

  13. Bakry, M.; Khamis, M.; Abdennadher, S.: AreCAPTCHA: outsourcing Arabic text digitization to native speakers. In: Proceedings of the 11th IARP International Workshop on Document Analysis Systems, Loire Valley (2014)

  14. Doan, A.; Ramakrishnan, R.; Halevy, A.Y.: Crowdsourcing systems on the world wide web. Commun. ACM 54, 86–96 (2014)

    Article  Google Scholar 

  15. Brabham, D.: Crowdsourcing as a model for problem solving: an introduction and cases. Int. J. Res. New Media Technol. 14(1), 75–90 (2008)

    Article  Google Scholar 

  16. Summit, C.A.: Digital Arabic Content: Background Paper. Connect Arab Summit, Doha (2012)

    Google Scholar 

  17. Internet Usage in the Middle East, Internet Word Stats, 31 December 2013. [Online]. http://www.internetworldstats.com/stats19.htm. Accessed 14 Dec 2015

  18. Lab, W.R.: Digital Arabic Content: An Industry Brief. Wamda Research Lab, MENA Region (2015)

  19. Abubaker, H.; Salah, K.; Al-Muhairi, H.; Bentiba, A.: Digital Arabic content: challenges and opportunities. In: Proceedings of the International Conference on Information and Communication Technology Research (ICTRC’15), Abu Dhabi (2015)

  20. Mohammed bin Rashid Arabic Language Award, UAE Government, 7 May 2014. [Online]. http://sheikhmohammed.ae/en-us/Pages/AwardDetail.aspx. Accessed 1 Mar 2016

  21. Bashir, Mohammed bin Rashid library launched, Emirates News Agency, 1 February 2016. [Online]. https://www.wam.ae/en/news/emirates/1395291019583.html. Accessed 12 Mar 2016

  22. Salah, K.: A queueing model to achieve proper elasticity for cloud cluster jobs. In: IEEE Sixth International Conference on Cloud Computing, California (2013)

  23. Qian, L.; Zhu, Z.; Hu, J.; Liu, S.: Research of SQL injection attack and prevention technology. In: Proceedings of 2015 International Conference on Estimation, Detection and Information Fusion, Harbin (2015)

  24. Eshkevari, L.; Santos, F.D.; Cordy, J.R.; Antoniol, G.: Are PHP applications ready for Hack? In: Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Montreal, QC (2015)

  25. Banday, M.T.; Shiekh, S.A.: Design of CAPTCHA script for Indian regional websites. In: Security in Computing and Communications. Springer, Berlin, pp. 98–109 (2013)

  26. Fidas, C.A.; Voyiatzis, A.G.; Avouris, N.M.: On the necessity of user-friendly CAPTCHA. In: The International Conference on Human Factors in Computing Systems, Vancouver, BC (2011)

  27. Al-Fakhoury, H.: . Dar Al-Jeel, Beirut (2005)

  28. Wang, T.; Wu, D.J.; Coates, A.; Ng, A.Y.: End-to-end text recognition with convolutional neural network. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), Tsukuba Science City, Japan (2012)

  29. Junker, M.; Hoch, R.; Dengel, A.: On the evaluation of document analysis components by recall, precision, and accuracy. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, Bangalore, India (1999)

  30. Bursztein, E.; Bethard, S.; Fabry, C.; Mitchell, J.C.; Jurafsky, D.: How good are humans at solving CAPTCHAs? A large scale evaluation. In: The 2010 IEEE Symposium on Security and Privacy, California (2010)

  31. AlKhuaiter, K.: , KSA (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Salah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abubaker, H., Salah, K., Al-Muhairi, H. et al. Arabic reCAPTCHA Service for Enhancing Digitization of Arabic Manuscripts. Arab J Sci Eng 42, 3391–3408 (2017). https://doi.org/10.1007/s13369-017-2494-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-017-2494-2

Keywords

Navigation