Skip to main content
Log in

Arabic optical character recognition software: A review

  • Software And Hardware for Pattern Recognition and Image Analysis
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

This paper provides a thorough evaluation of a set of six important Arabic OCR systems available in the market; namely: Abbyy FineReader, Leadtools, Readiris, Sakhr, Tesseract and NovoVerus. We test the OCR systems using a randomly selected images from the well known Arabic Printed Text Image database (250 images from the APTI database) and using a set of 8 images from an Arabic book. The APTI database contains 45.313.600 of both decomposable and non-decomposable word images. In the evaluation, we conduct two tests. The first test is based on usual metrics used in the literature. In the second test, we provide a novel measure for Arabic language, which can be used for other non-Latin languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. M. Al-A’ali and J. Ahmed, “Optical character recognition system for arabic text using curstional approach,” J. Comput. Sci. 3.7, 549–555 (2007).

    Google Scholar 

  2. L. M. Lorigo and V. Govindaraju, “Offline arabic handwriting recognition: a survey,” IEEE Trans. Pattern Anal. Mach. Intellig. 28.5, 712–724 (2006).

    Article  Google Scholar 

  3. A. M. Zeki, “The segmentation problem on arabic character recognition the state of the art,” in Proc. 1st Int. Conf. on Information and Communication Technology (ICICT) (Karachi, 2005), pp. 48–57.

    Google Scholar 

  4. A. Zahour et al., “Text line segmentation of historical arabic documents,” in Proc. 9th IEEE Int. Conf. on Document Analysis and Recognition ICDAR (2007), Vol. 1, pp. 138–142.

    Google Scholar 

  5. Line Eikvil, Optical character recognition (1993). http://citeseerx.ist.psu.edu/142042.html

  6. B. Comrie, The World’s Major Languages (Routledge, 2009).

    Google Scholar 

  7. B. Rehman, Z. Halim, and M. Ahmad, “ASCII based GUI system for arabic scripted languages: a case of urdu,” Int. Arab. J. Inf. Technol. 11.4, 329–337 (2014).

    Google Scholar 

  8. R. A. Haraty and C. Ghaddar, “Arabic text recognition,” Int. Arab. J. Inf. Technol. 1.2, 156–163 (2004).

    Google Scholar 

  9. H. Y. Abdelazim, “Recent trends in arabic character recognition,” in Proc. 6th Conf. on Language Engineering (Cairo, 2006), pp. 212–249.

    Google Scholar 

  10. L. Chergui, M. Kef, and S. Chikhi, “Combining neural networks for arabic handwriting recognition,” in Proc. 10th IEEE Int. Symp. on Programming and Systems (ISPS) (Algiers, 2011), pp. 74–79.

    Google Scholar 

  11. L. Chergui, M. Kef, and S. Chikhi, “Combining neural networks for arabic handwriting recognition,” Int. Arab J. Inf. Technol. 9.6, 588–595 (2012).

    Google Scholar 

  12. J. AlKhateeb et al., “Knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten Arabic text,” in Proc. 5th Int. Conf. on Information Technology: New Generations (Las Vegas, 2008), pp. 1158–1159.

    Google Scholar 

  13. M. Khemakhem and A. Belghith, “Towards a distributed arabic OCR based on the DTW algorithm: Performance analysis,” Int. Arab J. Inf. Technol. 6 (2), 153–161 (2009).

    Google Scholar 

  14. A. S. Atallah and K. Omar, “Methods of arabic language baseline detectionthe state of art,” Int. J. Comput. Sci. Network Security 8, 137–143 (2008).

    Google Scholar 

  15. A. Broumandnia, J. Shanbehzadeh, and M. Nourani, “Handwritten farsi/arabic word recognition,” in Proc. IEEE/ACS Int. Conf. on Computer Systems and Applications AICCSA’07 (Amman, 2007), pp. 767–771. doi: 10.1109/AICCSA.2007.370719.10.1109/AICCSA.2007. 370719

    Google Scholar 

  16. E. Mendelson, “ABBYY finereader professional 9.0,” PC Mag. (2008).

    Google Scholar 

  17. Leadtool OCR module API help. https://www.leadtools. com/help/leadtools/v15/ocr/api/whnjs.htm. Accessed July 23, 2016.

  18. Readiris web site. http://www.irislink.com/. Accessed July 23, 2016.

  19. Sakhr Software Arabic Language Technology. http://www.sakhr.com/index.php/en/solutions/ocr. Accessed July 23, 2016.

  20. Tesseract Open Source OCR Engine. https:// github.com/tesseractocr. Accessed July 23, 2016.

  21. NovoVerus software. https://www.novodynamics. com/novoverus/. Accessed July 23, 2016.

  22. S. F. Rashid, F. Shafait, and T. M. Breuel, “An evaluation of HMM-based techniques for the recognition of screen rendered text,” in Proc. IEEE Int. Conf. on Document Analysis and Recognition (Beijing, 2011), pp. 1260–1264.

    Google Scholar 

  23. S. Yousfi, S.-A. Berrani, and C. Garcia, “ALIF: A dataset for arabic embedded text recognition in TV broadcast,” in Proc. 13th IEEE Int. Conf. on Document Analysis and Recognition (ICDAR) (Tunis, 2015), pp. 1221–1225.

    Google Scholar 

  24. A. H. Hassin et al., “Printed arabic character recognition using HMM,” J. Comput. Sci. Technol. 19.4, 538–543 (2004).

    Article  Google Scholar 

  25. I. Bazzi, R. Schwartz, and J. Makhoul, “An omnifont open-vocabulary OCR system for English and Arabic,” IEEE Trans. Pattern Anal. Mach. Intellig. 21.6, 495–504 (1999).

    Article  Google Scholar 

  26. A. Cheung, M. Bennamoun, and N. W. Bergmann, “An Arabic optical character recognition system using recognition-based segmentation,” Pattern Recogn. 34.2, 215–233 (2001).

    Article  MATH  Google Scholar 

  27. B. Al-Badr and R. M. Haralick, “Segmentation-free word recognition with application to Arabic,” in Proc. 3rd IEEE Int. Conf. on Document Analysis and Recognition (Montreal, 1995), Vol. 1, pp. 355–359.

    Article  Google Scholar 

  28. A. Krayem, et al., “Holistic Arabic whole word recognition using HMM and block-based DCT,” in Proc. 12th IEEE Int. Conf. on Document Analysis and Recognition (Washington, 2013), pp. 1120–1124.

    Google Scholar 

  29. F. K. Jaiem et al., “Database for Arabic printed text recognition research,” in Proc. Int. Conf. on Image Analysis and Processing (Springer, 2013), pp. 251–259.

    Google Scholar 

  30. F. Slimane et al., “ICDAR 2011-arabic recognition competition: multi-font multi-size digitally represented text,” in Proc. IEEE Int. Conf. on Document Analysis and Recognition (ICDAR) (Beijing, 2011), pp. 1449–1453.

    Google Scholar 

  31. T. Kanungo et al., “OmniPage vs. Sakhr: paired model evaluation of two Arabic OCR products,” Proc. SPIE 3651, Document Recognition and Retrieval VI 109, 48–57 (1999).

    Google Scholar 

  32. F. Slimane et al., “A new arabic printed text image database and evaluation protocols,” in Proc. 10th IEEE Int. Conf. on Document Analysis and Recognition ICDAR’09 (Barcelona, 2009), pp. 946–950.

    Google Scholar 

  33. I. Abu Doush, F. Alkhateeb, and A. Al Raoof Bsoul, “AraDaisy: A system for automatic generation of Arabic DAISY books,” Int. J. Comput. Appl. Technol. 55 (4) (2017).

    Google Scholar 

  34. I. Abu Doush, F. Alkhateeb, and A. Al Raoof Bsoul, “What we have and what is needed, how to evaluate Arabic Speech Synthesizer?,” Int. J. Speech Technol. 19.3, 655–655 (2016).

    Article  Google Scholar 

  35. M. Pechwitz et al., “IFN/ENIT-database of handwritten Arabic words,” in Proc. Francophone Int. Conf. on Writing and Document CIFED’02 (Hammamet, 2002), pp. 127–136.

    Google Scholar 

  36. M. Al Azawi and T. M. Breuel, “Context-dependent confusions rules for building error model using weighted finite state transducers for OCR post-processing,” in Proc. 11th IAPR Int. Workshop on Document Analysis Systems (Tours, 2014), pp. 116–120.

    Google Scholar 

  37. M. Al Azawi et al., “Character-level alignment using WFST and LSTM for post-processing in multi-script recognition systems-A comparative study,” in Proc. 11th Int. Conf. on Image Analysis and Recognition (Vilamoura, 2014), pp. 379–386.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faisal Alkhateeb.

Additional information

The article is published in the original.

Faisal Alkhateeb is an Associate Professor in the department of Computer Sciences at Yarmouk University. He obtained his Ph.D. from Grenoble 1 university (2008), M.Sc from Grenoble 1 university (2004), M.Sc from Yarmouk University (2003), and his B.Sc. from Yarmouk University (1999). He is interested in knowledge-based systems, knowledge representation and reasoning, intelligent systems, constraint satisfaction and optimization problems. He became the chairman of Computer Sciences department at Yarmouk University in September 2010.

Iyad Abu Doush is an Associate Professor in the department of Computer Science and Information Systems at American University of Kuwait. He obtained his PhD from the Computer Science Department at New Mexico State University, USA in 2009. Dr. Abu Doush completed his B.Sc. in computer science from Yarmouk University, Jordan, and his M.Sc. in Computer Science and Information Systems from Yarmouk University, Jordan. Dr. Abu Doush has supervised, advised and referred senior projects, master theses and number of journals. Dr. Abu Doush served as coach and committee member in the ACM Jordanian Collegiate Programming Contest for three years. Dr. Abu Doush has been funded several times to conduct research in his areas of expertise from different agencies including: USAID, Microsoft, King Abdullah II Design and Development Bureau, Deanship of Research and Graduate Studies at Yarmouk University and Jordanian Scientific Research Support Fund. Dr. Abu Doush has published more than 40 articles in international journals and conferences. Dr. Abu Doush was selected to serve as a visiting researcher in universities of Malaysia and Lithuania. His research interests include evolutionary algorithms, optimization, accessibility, and human computer interaction.

Abdelraoaf Albsoul received his Ph.D. degree from Virginia Common- wealth University, Richmond, VA, in 2011. From 2009 to 2011, he worked as a lecturer in the computer information system at ECPI university, Newport News, VA. In 2011 he was appointed with computer science department in Yarmouk university as an assistant professor. He was worked as the Dean’s assistant for students affairs from 2015 to 2016 and from 2016 he is selected to be the computer science department chairman. His current research interests include signal and image processing, wireless sensor networks, natural language processing, and computational intelligent systems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alkhateeb, F., Abu Doush, I. & Albsoul, A. Arabic optical character recognition software: A review. Pattern Recognit. Image Anal. 27, 763–776 (2017). https://doi.org/10.1134/S105466181704006X

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S105466181704006X

Keywords

Navigation