Arabic optical character recognition software: A review

Alkhateeb, Faisal; Abu Doush, Iyad; Albsoul, Abdelraoaf

doi:10.1134/S105466181704006X

Arabic optical character recognition software: A review

Software And Hardware for Pattern Recognition and Image Analysis
Published: 09 December 2017

Volume 27, pages 763–776, (2017)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Faisal Alkhateeb¹,
Iyad Abu Doush^1,2 &
Abdelraoaf Albsoul¹

381 Accesses
8 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

This paper provides a thorough evaluation of a set of six important Arabic OCR systems available in the market; namely: Abbyy FineReader, Leadtools, Readiris, Sakhr, Tesseract and NovoVerus. We test the OCR systems using a randomly selected images from the well known Arabic Printed Text Image database (250 images from the APTI database) and using a set of 8 images from an Arabic book. The APTI database contains 45.313.600 of both decomposable and non-decomposable word images. In the evaluation, we conduct two tests. The first test is based on usual metrics used in the literature. In the second test, we provide a novel measure for Arabic language, which can be used for other non-Latin languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arabic Character Recognition

Performance Evaluation of Arabic Optical Character Recognition Engines for Noisy Inputs

ASAR 2021 Competition on Online Arabic Word Recognition

References

M. Al-A’ali and J. Ahmed, “Optical character recognition system for arabic text using curstional approach,” J. Comput. Sci. 3.7, 549–555 (2007).
Google Scholar
L. M. Lorigo and V. Govindaraju, “Offline arabic handwriting recognition: a survey,” IEEE Trans. Pattern Anal. Mach. Intellig. 28.5, 712–724 (2006).
Article Google Scholar
A. M. Zeki, “The segmentation problem on arabic character recognition the state of the art,” in Proc. 1st Int. Conf. on Information and Communication Technology (ICICT) (Karachi, 2005), pp. 48–57.
Google Scholar
A. Zahour et al., “Text line segmentation of historical arabic documents,” in Proc. 9th IEEE Int. Conf. on Document Analysis and Recognition ICDAR (2007), Vol. 1, pp. 138–142.
Google Scholar
Line Eikvil, Optical character recognition (1993). http://citeseerx.ist.psu.edu/142042.html
B. Comrie, The World’s Major Languages (Routledge, 2009).
Google Scholar
B. Rehman, Z. Halim, and M. Ahmad, “ASCII based GUI system for arabic scripted languages: a case of urdu,” Int. Arab. J. Inf. Technol. 11.4, 329–337 (2014).
Google Scholar
R. A. Haraty and C. Ghaddar, “Arabic text recognition,” Int. Arab. J. Inf. Technol. 1.2, 156–163 (2004).
Google Scholar
H. Y. Abdelazim, “Recent trends in arabic character recognition,” in Proc. 6th Conf. on Language Engineering (Cairo, 2006), pp. 212–249.
Google Scholar
L. Chergui, M. Kef, and S. Chikhi, “Combining neural networks for arabic handwriting recognition,” in Proc. 10th IEEE Int. Symp. on Programming and Systems (ISPS) (Algiers, 2011), pp. 74–79.
Google Scholar
L. Chergui, M. Kef, and S. Chikhi, “Combining neural networks for arabic handwriting recognition,” Int. Arab J. Inf. Technol. 9.6, 588–595 (2012).
Google Scholar
J. AlKhateeb et al., “Knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten Arabic text,” in Proc. 5th Int. Conf. on Information Technology: New Generations (Las Vegas, 2008), pp. 1158–1159.
Google Scholar
M. Khemakhem and A. Belghith, “Towards a distributed arabic OCR based on the DTW algorithm: Performance analysis,” Int. Arab J. Inf. Technol. 6 (2), 153–161 (2009).
Google Scholar
A. S. Atallah and K. Omar, “Methods of arabic language baseline detectionthe state of art,” Int. J. Comput. Sci. Network Security 8, 137–143 (2008).
Google Scholar
A. Broumandnia, J. Shanbehzadeh, and M. Nourani, “Handwritten farsi/arabic word recognition,” in Proc. IEEE/ACS Int. Conf. on Computer Systems and Applications AICCSA’07 (Amman, 2007), pp. 767–771. doi: 10.1109/AICCSA.2007.370719.10.1109/AICCSA.2007. 370719
Google Scholar
E. Mendelson, “ABBYY finereader professional 9.0,” PC Mag. (2008).
Google Scholar
Leadtool OCR module API help. https://www.leadtools. com/help/leadtools/v15/ocr/api/whnjs.htm. Accessed July 23, 2016.
Readiris web site. http://www.irislink.com/. Accessed July 23, 2016.
Sakhr Software Arabic Language Technology. http://www.sakhr.com/index.php/en/solutions/ocr. Accessed July 23, 2016.
Tesseract Open Source OCR Engine. https:// github.com/tesseractocr. Accessed July 23, 2016.
NovoVerus software. https://www.novodynamics. com/novoverus/. Accessed July 23, 2016.
S. F. Rashid, F. Shafait, and T. M. Breuel, “An evaluation of HMM-based techniques for the recognition of screen rendered text,” in Proc. IEEE Int. Conf. on Document Analysis and Recognition (Beijing, 2011), pp. 1260–1264.
Google Scholar
S. Yousfi, S.-A. Berrani, and C. Garcia, “ALIF: A dataset for arabic embedded text recognition in TV broadcast,” in Proc. 13th IEEE Int. Conf. on Document Analysis and Recognition (ICDAR) (Tunis, 2015), pp. 1221–1225.
Google Scholar
A. H. Hassin et al., “Printed arabic character recognition using HMM,” J. Comput. Sci. Technol. 19.4, 538–543 (2004).
Article Google Scholar
I. Bazzi, R. Schwartz, and J. Makhoul, “An omnifont open-vocabulary OCR system for English and Arabic,” IEEE Trans. Pattern Anal. Mach. Intellig. 21.6, 495–504 (1999).
Article Google Scholar
A. Cheung, M. Bennamoun, and N. W. Bergmann, “An Arabic optical character recognition system using recognition-based segmentation,” Pattern Recogn. 34.2, 215–233 (2001).
Article MATH Google Scholar
B. Al-Badr and R. M. Haralick, “Segmentation-free word recognition with application to Arabic,” in Proc. 3rd IEEE Int. Conf. on Document Analysis and Recognition (Montreal, 1995), Vol. 1, pp. 355–359.
Article Google Scholar
A. Krayem, et al., “Holistic Arabic whole word recognition using HMM and block-based DCT,” in Proc. 12th IEEE Int. Conf. on Document Analysis and Recognition (Washington, 2013), pp. 1120–1124.
Google Scholar
F. K. Jaiem et al., “Database for Arabic printed text recognition research,” in Proc. Int. Conf. on Image Analysis and Processing (Springer, 2013), pp. 251–259.
Google Scholar
F. Slimane et al., “ICDAR 2011-arabic recognition competition: multi-font multi-size digitally represented text,” in Proc. IEEE Int. Conf. on Document Analysis and Recognition (ICDAR) (Beijing, 2011), pp. 1449–1453.
Google Scholar
T. Kanungo et al., “OmniPage vs. Sakhr: paired model evaluation of two Arabic OCR products,” Proc. SPIE 3651, Document Recognition and Retrieval VI 109, 48–57 (1999).
Google Scholar
F. Slimane et al., “A new arabic printed text image database and evaluation protocols,” in Proc. 10th IEEE Int. Conf. on Document Analysis and Recognition ICDAR’09 (Barcelona, 2009), pp. 946–950.
Google Scholar
I. Abu Doush, F. Alkhateeb, and A. Al Raoof Bsoul, “AraDaisy: A system for automatic generation of Arabic DAISY books,” Int. J. Comput. Appl. Technol. 55 (4) (2017).
Google Scholar
I. Abu Doush, F. Alkhateeb, and A. Al Raoof Bsoul, “What we have and what is needed, how to evaluate Arabic Speech Synthesizer?,” Int. J. Speech Technol. 19.3, 655–655 (2016).
Article Google Scholar
M. Pechwitz et al., “IFN/ENIT-database of handwritten Arabic words,” in Proc. Francophone Int. Conf. on Writing and Document CIFED’02 (Hammamet, 2002), pp. 127–136.
Google Scholar
M. Al Azawi and T. M. Breuel, “Context-dependent confusions rules for building error model using weighted finite state transducers for OCR post-processing,” in Proc. 11th IAPR Int. Workshop on Document Analysis Systems (Tours, 2014), pp. 116–120.
Google Scholar
M. Al Azawi et al., “Character-level alignment using WFST and LSTM for post-processing in multi-script recognition systems-A comparative study,” in Proc. 11th Int. Conf. on Image Analysis and Recognition (Vilamoura, 2014), pp. 379–386.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, Yarmouk University Irbid, 21163, Irbid, Jordan
Faisal Alkhateeb, Iyad Abu Doush & Abdelraoaf Albsoul
Computer Science Department, American University of Kuwait Salmiya, The State of Kuwait, Kuwait
Iyad Abu Doush

Authors

Faisal Alkhateeb
View author publications
You can also search for this author in PubMed Google Scholar
Iyad Abu Doush
View author publications
You can also search for this author in PubMed Google Scholar
Abdelraoaf Albsoul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faisal Alkhateeb.

Additional information

The article is published in the original.

Faisal Alkhateeb is an Associate Professor in the department of Computer Sciences at Yarmouk University. He obtained his Ph.D. from Grenoble 1 university (2008), M.Sc from Grenoble 1 university (2004), M.Sc from Yarmouk University (2003), and his B.Sc. from Yarmouk University (1999). He is interested in knowledge-based systems, knowledge representation and reasoning, intelligent systems, constraint satisfaction and optimization problems. He became the chairman of Computer Sciences department at Yarmouk University in September 2010.

Iyad Abu Doush is an Associate Professor in the department of Computer Science and Information Systems at American University of Kuwait. He obtained his PhD from the Computer Science Department at New Mexico State University, USA in 2009. Dr. Abu Doush completed his B.Sc. in computer science from Yarmouk University, Jordan, and his M.Sc. in Computer Science and Information Systems from Yarmouk University, Jordan. Dr. Abu Doush has supervised, advised and referred senior projects, master theses and number of journals. Dr. Abu Doush served as coach and committee member in the ACM Jordanian Collegiate Programming Contest for three years. Dr. Abu Doush has been funded several times to conduct research in his areas of expertise from different agencies including: USAID, Microsoft, King Abdullah II Design and Development Bureau, Deanship of Research and Graduate Studies at Yarmouk University and Jordanian Scientific Research Support Fund. Dr. Abu Doush has published more than 40 articles in international journals and conferences. Dr. Abu Doush was selected to serve as a visiting researcher in universities of Malaysia and Lithuania. His research interests include evolutionary algorithms, optimization, accessibility, and human computer interaction.

Abdelraoaf Albsoul received his Ph.D. degree from Virginia Common- wealth University, Richmond, VA, in 2011. From 2009 to 2011, he worked as a lecturer in the computer information system at ECPI university, Newport News, VA. In 2011 he was appointed with computer science department in Yarmouk university as an assistant professor. He was worked as the Dean’s assistant for students affairs from 2015 to 2016 and from 2016 he is selected to be the computer science department chairman. His current research interests include signal and image processing, wireless sensor networks, natural language processing, and computational intelligent systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alkhateeb, F., Abu Doush, I. & Albsoul, A. Arabic optical character recognition software: A review. Pattern Recognit. Image Anal. 27, 763–776 (2017). https://doi.org/10.1134/S105466181704006X

Download citation

Received: 15 June 2017
Published: 09 December 2017
Issue Date: October 2017
DOI: https://doi.org/10.1134/S105466181704006X

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arabic optical character recognition software: A review

Abstract

Access this article

Similar content being viewed by others

Arabic Character Recognition

Performance Evaluation of Arabic Optical Character Recognition Engines for Noisy Inputs

ASAR 2021 Competition on Online Arabic Word Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arabic optical character recognition software: A review

Abstract

Access this article

Similar content being viewed by others

Arabic Character Recognition

Performance Evaluation of Arabic Optical Character Recognition Engines for Noisy Inputs

ASAR 2021 Competition on Online Arabic Word Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation