Abstract
This chapter addresses automatic printed Arabic text recognition. Arabic text recognition has its own difficulties due to the cursive nature of the scripts, overlapping characters, large number of dots and diacritics, etc. In this chapter, we present a general framework for a printed Arabic text recognition system. We then discuss different phases of such a system, e.g., pre-processing, feature extraction, and classification. We present different reported techniques for each phase. In addition, different databases for printed Arabic text recognition are discussed here. We conclude this chapter by presenting several experimental results for hidden Markov model (HMM)-based printed Arabic text recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdel-Azim, H.Y., Hashish, M.A.: A hidden Markov modelling approach to the recognition of signatures: a feasibility study. In: Proceedings of the First Kuwait Computer Conference, March 1989, pp. 402–425 (1989)
Abdel-Azim, H.Y., Hashish, M.A.: Automatic recognition of handwritten Hindi numerals. In: Proceedings of the 11th Saudi National Computer Conference, March 1989, pp. 287–298 (1989)
Abdel-Azim, H.Y., Mousa, A.M., Saleh, Y.L., Hashish, M.A.: Arabic text recognition using a partial observation approach. In: Proceedings of the 12th National Computer Conference, October 1990, pp. 427–437 (1990)
Abdelraouf, A., Higgins, C.A., Khalil, M.: A database for Arabic printed character recognition. In: Proceedings of the 5th International Conference on Image Analysis and Recognition (ICIAR), pp. 567–578 (2008)
Akiyama, T., Hagita, N.: Automatic entry system for printed documents. Pattern Recognit. 23(11), 1141–1154 (1990)
Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995)
Al-Emami, S., Usher, M.: On-line recognition of handwritten Arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 704–710 (1990)
Al-Hashim, A.G., Mahmoud, S.A.: Benchmark database and GUI environment for printed Arabic text recognition research. WSEAS Trans. Inf. Sci. Appl. 4(7), 587–597 (2010)
Al-Muhtaseb, H.A., Mahmoud, S.A., Qahwaji, R.: Recognition of off-line printed Arabic text using hidden Markov models. Signal Process. 88(12), 2902–2912 (2008)
Al-Qaisy, E.K., Naser, H.L.: Using probabilistic functions for the recognition of handwritten Arabic numerals. In: Proc. of the First Kuwait Computer Conference (in Arabic), pp. 109–120 (1989)
Al-Tikriti, M.N., Al-Ramchi, S.K.: Fuzzy approach for some Arabic handwritten characters computer recognition. In: Proc. Comput. Processing Arabic Language Workshop Papers, pp. 1–14 (1985)
Al-Yousefi, H., Udpa, S.: Recognition of handwritten Arabic characters. In: Proc. of the SPIE 32nd Annual Technical Symposium on Optical and Opto-Electronics Applied Science and Engineering, San Diego, CA, USA, vol. 974, pp. 330–336 (1988)
Ali, S.A., Al-Saadoun, M.S.: A parallel algorithm for image thinning. In: Proc. of the First Kuwait Computer Conference (in Arabic), Kuwait, pp. 121–140 (1989)
Amin, A.: Arabic handwriting recognition and understanding. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait, vol. 1, pp. 1–37 (1985)
Amin, A.: OCR of Arabic texts. In: Proc. of the 9th Int. Conf. on Pattern Recognition, pp. 616–625. University of Cambridge, Cambridge (1988)
Amin, A., Al-Fedaghi, S.: Machine recognition of printed Arabic text utilizing natural language morphology. Int. J. Man-Mach. Stud. 35(6), 769–788 (1991)
Amin, A., Mari, J.F.: Machine recognition and correction of printed Arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300–1306 (1989)
Amin, A., Masini, G.: Machine recognition of multifont printed Arabic texts. In: Proc. of the 8th IEEE Int. Joint Conf. on Pattern Recognition, Paris, France, pp. 392–395 (1986)
Badie, K., Shimura, M.: Machine recognition of Arabic handprinted scripts. Trans. Inst. Electron. Commun. Eng. Jpn., Sect. E 65(2), 107–114 (1982)
Baird, H.S.: Calibration of document image defect models. In: Proceedings of the 2nd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 1–16 (1993)
Bazzi, I., LaPre, C., Makhoul, J., Raphael, C., Schwartz, R.: Omnifont and unlimited-vocabulary OCR for English and Arabic. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, August 1997, vol. 2, pp. 842–846 (1997)
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)
Ben Amor, N., Ben Amara, N.E.: A hybrid approach for multifont Arabic characters recognition. In: Proc. 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, pp. 194–198 (2006)
Bloomberg, D.S., Kopec, G.E.: Method and Apparatus for Identification and Correction of Document Skew (1993)
Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: Proc. SPIE, vol. 2422, 302–316 (1995)
Borghesi, P.: Digital image processing techniques for object recognition and experimental results. In: Proceedings of The Digital Signal Processing Conf., Florance, Italy, pp. 764–768 (1984)
Bozinovic, R., Srihari, S.: Off-line cursive script word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 11(1), 68–83 (1989)
Bunke, H., Bengio, S., Vinciarelli, A.: Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26, 709–720 (2004)
Davidson, R.B., Hopley, R.L.: Arabic and Persian OCR training and test data sets. In: Proc. SDIUT, Annapolis, MD, pp. 303–307 (1997)
Dehghan, M., Faeza, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognit. 34(5), 1057–1065 (2001)
Dehghani, A., Shabani, F.: Off-line recognition of isolated Persian handwritten characters using multiple hidden Markov models. In: International Conference on Information Technology: Coding and Computing, April 2001, pp. 506–510 (2001)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
El-Dabi, S.S., Ramsis, R., Kamal, A.: Arabic character recognition system: a statistical approach for recognizing cursive typewritten text. Pattern Recognit. 23(5), 485–495 (1990)
El-Khaly, F., Sid-Ahmed, M.A.: Machine recognition of optically captured machine printed Arabic text. Pattern Recognit. 23(11), 1207–1214 (1990)
El-Sheikh, T.S.: Recognition of handwritten Arabic mathematical formulas. In: Proc. of the UK IT 1990 Conf., pp. 344–351. University of Southampton, Southampton (1990)
El-Sheikh, T.S., El-Taweel, S.G.: Real-time Arabic handwritten character recognition. In: Proc. of the Third Int. Conf. on Image Processing and Its Applications, Warwick, UK, pp. 212–216 (1989)
El-Sheikh, T.S., Guindi, R.M.: Automatic recognition of isolated Arabic characters. Signal Process. 14(2), 177–184 (1988)
El-Sheikh, T.S., Guindi, R.M.: Computer recognition of Arabic cursive scripts. Pattern Recognit. 21(4), 293–302 (1988)
El-Wakil, M.S., Shoukry, A.A.: On-line recognition of handwritten isolated Arabic characters. Pattern Recognit. 22(2), 97–105 (1989)
Elarian, Y., Mahmoud, S.A.: An adaptive line segmentation algorithm (ALSA) for Arabic. In: Proc. International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’08), pp. 735–739 (2008)
Fakir, M., Sodeyama, C.: Recognition of Arabic printed scripts by dynamic programming matching method. IEICE Trans. Inf. Syst. E76-D(2), 235–242 (1993)
Farah, N., Ennaji, A., Khadir, T., Sellami, M.: Benefits of multi-classifier systems for Arabic handwritten words recognition. In: Proc. Intl Conf. Document Analysis and Recognition, pp. 222–226 (2005)
Fayek, M.B., Al-Basha, B.: A new hierarchical method for isolated typewritten Arabic character classification and recognition. In: Proc. of the 13th Nat. Computer Conf., Riyadh, Saudi Arabia, pp. 750–760 (1992)
Goraine, H., Usher, M., Al-Emami, S.: Off-line Arabic character recognition. Computer 25(7), 71–74 (1992)
Govindan, V.K., Shivaprasad, A.P.: Character recognition—a review. Pattern Recognit. 23(7), 671–683 (1990)
Graff, D., Chen, K., Kong, J., Maeda, K.: Arabic Gigaword, 2nd edn. Linguistic Data Consortium, University of Pennsylvania, Philadelphia (2006)
Hassin, A., Tang, X., Liu, J., Zhao, W.: Printed Arabic character recognition using HMM. J. Comput. Sci. Technol. 19(4), 538–543 (2004)
Hu, J., Lim, S., Brown, M.: Writer independent on-line handwriting recognition using an HMM approach. Pattern Recognit. 33, 133–147 (2000)
Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition—a survey. Int. J. Pattern Recognit. Artif. Intell. 5(1), 1–24 (1991)
Ishitani, Y.: Document skew detection based on local region complexity. In: Proc. 2nd Internat. Conf. on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 49–52 (1993)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)
Jenkins, F., Kanai, J.: A keyword-indexed bibliography of character recognition and document analysis (revision 2.0). Technical Report TR-93-07, Information Science Research Institute, University of Nevada, Las Vegas, April 1993
Jonathan, J.H.: Document image skew detection: survey and annotated bibliography. In: Document Analysis Systems, pp. 40–64. World Scientific, Singapore (1998)
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)
Kasturi, R., O’Gorman, L.: Document image analysis: a bibliography. Mach. Vis. Appl. 5, 231–243 (1992)
Khella, F.: Analysis of hexagonally sampled images with application to Arabic cursive text recognition. Ph.D. thesis, University of Bradford, Bradford, England (1992)
Khella, F., Mahmoud, S.A.: Recognition of hexagonally sampled Arabic characters. Arab. J. Sci. Eng. 19(4A), 565–586 (1994)
Khorsheed, M.S.: Mono-font cursive Arabic text recognition using speech recognition system. In: Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, vol. 4109, pp. 755–763 (2006)
Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007)
Liolios, N., Fakotakis, N., Kokkinakis, G.: On the generalization of the form identification and skew detection problem. Pattern Recognit. 35, 253–264 (2002)
Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)
Mahmoud, S.A.: Arabic character recognition using Fourier descriptors and character contour encoding. Pattern Recognit. 27(6), 815–824 (1994)
Mahmoud, S.A.: Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models. Signal Process. 27(6), 815–824 (2008)
Mantas, J.: An overview of character recognition methodologies. Pattern Recognit. 19(19), 425–430 (1986)
McClelland, D.: OCR: teaching your Mac to read. Macworld November, 169–178 (1991)
Mohamed, M., Gader, P.: Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Trans. Pattern Anal. Mach. Intell. 18(5), 548–554 (1996)
Mori, S., Yamamoto, K.Y., Yasuda, N.: Research on machine recognition of handprinted characters. IEEE Trans. Pattern Anal. Mach. Intell. 6(4), 386–405 (1984)
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1057 (1992)
Nouboud, F., Plamondon, R.: On-line recognition of handprinted characters: survey and beta tests. Pattern Recognit. 23(9), 1031–1044 (1990)
Nurul-Ula, A., Nouh, A.S.: Automatic recognition of Arabic characters using logic statements—part I: system description and pre-processing. J. King Saud Univ., Eng. Sci. 14(2), 343–353 (1988)
Parvez, M.T., Mahmoud, S.A.: Polygonal approximation of digital planar curves through adaptive optimizations. Pattern Recognit. Lett. 31(13), 1997–2005 (2010)
Pechwitz, M., Märgner, V.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT-database. In: Proc. Seventh International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, August 2003, pp. 890–894 (2003)
Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–82 (2000)
Postl, W.: Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th Internat. Conf. on Pattern Recognition, Paris, France, pp. 687–689 (1986)
Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: Proc. 19th International Conference on Pattern Recognition (ICPR) (2008)
Ramsis, R., El Dabi, S.S., Kamel, A.: Arabic character recognition system. Report KSC027, IBM Kuwait Scientific Center, Kuwait (1988)
SaadAllah, S., Yacu, S.G.: Design of an Arabic character reading machine. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait (1985)
Safabakhsh, R., Adibi, P.: Nastaaligh handwritten word recognition using a continuous-density variable-duration HMM. Arab. J. Sci. Eng. 30, 95–118 (2005)
Sarfraz, M., Mahmoud, S.A., Rasheed, Z.: On skew estimation and correction of text. In: Proc. Computer Graphics, Imaging and Visualisation (CGIV), Bangkok, Thailand, pp. 308–313 (2007)
Schlosser, S.: ERIM Arabic Document Database. http://documents.cfar.umd.edu/resources/database/ (1995). Environmental Research Institute of Michigan (ERIM)
Simon, J.-C.: Off-line cursive word recognition. Proc. IEEE 80(7), 1150–1161 (1992)
Slimane, F., Ingold, R., Alimi, A.M., Hennebert, J.: Duration models for Arabic text recognition using hidden Markov models. In: Proc. of the International Conferences on Computational Intelligence for Modelling, Control and Automation; Intelligent Agents, Web Technologies and Internet Commerce; and Innovation in Software Engineering, pp. 838–843 (2008)
Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: Proc. 10th International Conference on Document Analysis and Recognition, pp. 946–950 (2009)
Smith, L.I.: A Tutorial on Principal Components Analysis. Cornell University, Ithaca (2002)
Stallings, W.: Approaches to Chinese character recognition. Pattern Recognit. 8, 87–98 (1976)
Steinherz, T., Intrator, N., Rivlin, E.: Skew detection via principal components analysis. In: Proc. 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 153–156 (1999)
Suen, C.Y., Berthod, M., Mori, S.: Automatic recognition of handprinted characters—the state of the art. Proc. IEEE 68(4), 469–487 (1980)
Tappert, C.C., Suen, C.Y., Wakahara, T.: The state of the art in on-line handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(8), 787–808 (1990)
Tiana, Q., Zhang, P., Alexer, T., Kim, Y.: Survey: omnifont printed character recognition. In: Visual Communications and Image Processing, pp. 260–268 (1991)
Wakahara, T., Murase, H., Odaka, K.: On-line handwriting recognition. Proc. IEEE 80(7), 1181–1194 (1992)
Welch, E.M.: Can you read this? OCR software. MacUser 9(8), 169–178 (1993)
Young, S., Jansen, J., Odell, J., Ollason, D., Woodland, P.: The HTK Book (HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this chapter
Cite this chapter
Ahmed, I., Mahmoud, S.A., Parvez, M.T. (2012). Printed Arabic Text Recognition. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_7
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4072-6_7
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4071-9
Online ISBN: 978-1-4471-4072-6
eBook Packages: Computer ScienceComputer Science (R0)