A Novel Approach to Printed Arabic Optical Character Recognition

Al Ghamdi, Mansoor A.

doi:10.1007/s13369-021-06163-9

A Novel Approach to Printed Arabic Optical Character Recognition

Research Article-Computer Engineering and Computer Science
Published: 20 September 2021

Volume 47, pages 2219–2237, (2022)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Mansoor A. Al Ghamdi ORCID: orcid.org/0000-0002-2891-6374¹

285 Accesses
5 Citations
Explore all metrics

Abstract

Optical character recognition (OCR) is widely used in various real-world applications, such as digitizing learning resources, to assist visually impaired people and transform printed resources into electronic media. As far as the Arabic language is concerned, the need to extend digital Arabic content on the Internet has recently motivated researchers to focus on the Arabic text recognition. Despite the important number of works studying the Arabic OCR, the latter still faces numerous challenges due to the special characteristics of the Arabic script. This research aims at developing an effective printed Arabic OCR system. In this work, the implementation of a printed Arabic OCR system is described. It is divided into four stages: pre-processing, feature extraction as well as character segmentation and classification. Unlike other typical Arabic OCR systems, in the developed one, the feature extraction stage is performed prior to the character segmentation stage. In the pre-processing stage, a novel thinning algorithm is applied in order to produce skeletons for the Arabic text images. In the second stage, a new chain code representation technique using an agent-based model for the features extraction from non-dotted Arabic text images is proposed. Relying on the extracted features, a character segmentation technique employed to segment-connected Arabic words into characters is introduced. In the classification stage, the prediction by partial matching (PPM) compression-based method is applied as a classifier to recognize the Arabic text. Experimental evaluation of Arabic OCR systems on a public dataset reveals that the system has an accuracy of 77.3% for paragraph-based text images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

References

Alginahi, Y.M.: A survey on Arabic character segmentation. Int. J. Doc. Anal. Recogn. 16(2), 105–126 (2013)
Article Google Scholar
Al-Badr, B.; Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995)
Article Google Scholar
Darwish, S.M.; Elzoghaly, K.O.: An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier. IEEE Access 8, 1 (2020)
Article Google Scholar
Ahmad, I.: Modeling and training options for handwritten arabic text recognition. Technische Universität Dortmund, Dortmund (2016)
Google Scholar
Slimane, F.; Kanoun, S.; Hennebert, J.; Alimi, A.M.; Ingold, R.: A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution. Pattern Recognit. Lett. 34(2), 209–218 (2013)
Article Google Scholar
BinAhmed, S.; Naz, S.; Razzak, M.I.; Yusof, R.: Arabic cursive text recognition from natural scene images. Appl. Sci. 9(2), 236 (2019)
Article Google Scholar
Reul, C.: An intelligent semi-automatic workflow for optical character recognition of historical printings (2020)
Slimane, F., Ingold, R., Hennebert, J.: ICDAR2017 competition on multi-font and multi-size digitally represented Arabic text. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2018)
Nasrollahi, S.; Ebrahimi, A.: Printed persian subword recognition using wavelet packet descriptors. J. Eng. (UK) 2013, 1–11 (2013)
Article Google Scholar
Krayem, A., Sherkat, N., Evett, L., Osman, T.: Holistic arabic whole word recognition using HMM and block-based DCT. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013)
Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007)
Article Google Scholar
Khorsheed, M. S.: A lexicon based system with multiple hmms to recognise typewritten and handwritten Arabic words. In: The 17th National Computer Conference, Madinah, Saudi Arabia, pp. 5–8 (2004)
Khorsheed, M. S., Clocksin, W. F.: Multi-font arabic word recognition using spectral features. Proc. Int. Conf. Pattern Recognit. (2000)
Al-Badr, B., Haralick, R. M.: Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd International Conference on Document Analaysis Recognition, vol. 1, pp. 355–359 (1995)
Erlandson, E.J.; Trenkle, J.M.; Vogt, R.C.: Word-level recognition of multifont Arabic text using a feature vector matching approach. Doc. Recogn. III 2660, 63–71 (1996)
Google Scholar
Nashwan, F.M.A.; Rashwan, M.A.A.; Al-Barhamtoshy, H.M.; Abdou, S.M.; Moussa, A.M.: A holistic technique for an Arabic OCR system. J. Imag. 4, 1 (2018)
Google Scholar
Alghamdi, M., Teahan, W.: Printed Arabic script recognition: a survey. Int. J. Adv. Comput. Sci. Appl. (2018)
Nashwan, F.; Rashwan, M.; Al-Barhamtoshy, H.; Abdou, S.; Moussa, A.: A holistic technique for an Arabic OCR system. J. Imag. 4(1), 6 (2017)
Article Google Scholar
Khorsheed, M.S.: Off-line Arabic character recognition: a review. Pattern Anal. Appl. 5(1), 31–45 (2002)
Article MathSciNet Google Scholar
Tamen, Z., Drias, H.: How to overcome some segmentation problems in a constrained handwritten Arabic character recognition system. In: Proceedings of the 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010, pp. 634–637 (2010)
Younis, K.S.; Alkhateeb, A.A.: A new implementation of deep neural networks for optical character recognition and face recognition. Proc. New Trends Inf. Technol. Jordan 25, 157–162 (2017)
Google Scholar
Radwan, M.A.; Khalil, M.I.; Abbas, H.M.: Neural networks pipeline for offline machine printed Arabic OCR. Neural Process. Lett. 48, 769–787 (2018)
Article Google Scholar
Ko, D.; Lee, C.; Han, D.; Ohk, H.; Kang, K.; Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018(2), 1–8 (2018)
Article Google Scholar
Rashid, S. F., Schambach, M.-P., Rottland, J., von der Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013)
Rahman, A. F. R., Fairhurst, M. C.: Multiple classifier decision combination strategies for character recognition: a review. Int. J. Doc. Anal. Recognit. (2003)
Saad, M., Ashour, W.: OSAC: open source arabic corpora. In: Proceedings of the 6th International Conference on Electronics Computer System (EECS’10), Nov 25–26, 2010, Lefke, Cyprus, pp. 118–123 (2010)
Jambi, H.: King Abdullah Bin Abdulaziz AL-Saud Initiative for Arabic Content (2014). [Online]. https://archive.org/details/KingAbdullahBinAbdulazizAl-saudInitiativeForArabicContent. Accessed on 10 Feb 2019.
Etidal: Etidal—Global center for combating extremist ideology (2018). [Online]. https://etidal.org/en/home/. Accessed on 06 Mar 2019.
Alghamdi, M.A.; Teahan, W.J.: A new thinning Algorithm for Arabic script. Int. J. Comput. Sci. Inf. Sec. (IJCSIS) 15(1), 204–211 (2017)
Google Scholar
Zhang, T.Y.; Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)
Article Google Scholar
Hilitch, C. J.: Linear skeletons from square cupboards. In: B. Meltzer and D. Michie, (Eds) Machine intelligence, vol 4, Edinburgh University Press, Edinburgh, p. 403 (1969)
Rashid, S. F.: Optical character recognition: a combined ANN/HMM approach (2014)
Pi, Y., Liao, W., Liu, M., Lu, J.: Theory of cognitive pattern recognition. Pattern Recogn. Tech. 4 (2008).
Bobik, J.; Sayre, K.M.: Pattern recognition mechanisms and St. Thomas’ theory of abstraction. Rev. Philos. Louv. 61, 24–43 (1963)
Google Scholar
Kandel, S.; Orliaguet, J.P.; Viviani, P.: Perceptual anticipation in handwriting: The role of implicit motor competence. Percept. Psychophys. 62(4), 706–716 (2000)
Article Google Scholar
Tse, P.U.; Cavanagh, P.: Chinese and Americans see opposite apparent motions in a Chinese character. Cognition 74, 3 (2000)
Article Google Scholar
Vinter, A.; Chartrel, E.: Effects of different types of learning on handwriting movements in young children. Learn. Instr. 20(6), 476–486 (2010)
Article Google Scholar
Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961)
Article MathSciNet Google Scholar
Nixon, M. S., Aguado, A. S.: Feature extraction and image processing. Academic Press, New York, p. 88 (2008)
Kocyigit, P.: Agent based optical character recognition. Bangor (2012)
Zeki, A.M.; Zakaria, M.S.; Liong, C.-Y.: Segmentation of Arabic characters: a comprehensive survey. Int. J. Technol. Diffus. 2(4), 48–82 (2011)
Article Google Scholar
Gouda, A. M., Rashwan, M. A.: Segmentation of connected arabic characters using hidden markov models. In: Proceedings of the 2004 IEEE international conference on computational intelligence for measurements systems and applications, CIMSA (2004)
Alkhazi, I. S., Alghamdi, M., Teahan, W. J.: Tag based models for Arabic text compression. In: Proceedings of the Intelligent System Conference, pp. 697–705 (2017)
Teahan, W.: A compression-based toolkit for modelling and processing natural language text. Information 9(12), 294 (2018)
Article Google Scholar
Teahan, W. J., Harper, D. J.: Using compression-based language models for text categorization. Lang. Model. Inf. Retr. (2003)
Altamimi, M., Teahan, W. J.: Gender and authorship categorisation of arabic text from twitter using PPM. Int. J. Comput. Sci. Inf. Technol. (2017)
Almahdawi, A., Teahan, W. J.: Emotion recognition in text using ppm. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (2017)
Alkhazi, I. S., Teahan, W. J.: Classifying and segmenting classical and modern standard Arabic using minimum cross-entropy. Int. J. Adv. Comput. Sci. Appl. (2017)
Alkholy, M.D.A.-Z.: Arabic optical character recognition using local invariant features. Menoufia University, Faculty of Computers and Information (2016)
Google Scholar
AbdelRaouf, A. M.: Offline printed Arabic character recognition (2012)
Luqman, H.; Mahmoud, S.A.; Awaida, S.: KAFD Arabic font database. Pattern Recognit. 47(6), 2231–2240 (2014)
Article Google Scholar
Alghamdi, M.A.; Teahan, W.J.: Experimental evaluation of Arabic OCR systems. PSU Res. Rev. 1(3), 229–241 (2017)
Article Google Scholar
Garcia, A. L.: Arabic learning materials (2005). [Online]. https://www.scribd.com/document/294105424/Arabic-Writing-Sheet-Revised. Accessed on 02 Mar 2018

Download references

Author information

Authors and Affiliations

Department of Computer Science, Community College, University of Tabuk, Tabuk, Saudi Arabia
Mansoor A. Al Ghamdi

Authors

Mansoor A. Al Ghamdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansoor A. Al Ghamdi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al Ghamdi, M.A. A Novel Approach to Printed Arabic Optical Character Recognition. Arab J Sci Eng 47, 2219–2237 (2022). https://doi.org/10.1007/s13369-021-06163-9

Download citation

Received: 08 June 2021
Accepted: 30 August 2021
Published: 20 September 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s13369-021-06163-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Approach to Printed Arabic Optical Character Recognition

Abstract

Access this article

Similar content being viewed by others

Arabic Character Recognition

Benchmarking Post-processing Techniques for Offline Arabic Text Recognition System

Database for Arabic Printed Text Recognition Research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Approach to Printed Arabic Optical Character Recognition

Abstract

Access this article

Similar content being viewed by others

Arabic Character Recognition

Benchmarking Post-processing Techniques for Offline Arabic Text Recognition System

Database for Arabic Printed Text Recognition Research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation