Skip to main content

Printed Arabic Text Recognition

  • Chapter
Guide to OCR for Arabic Scripts

Abstract

This chapter addresses automatic printed Arabic text recognition. Arabic text recognition has its own difficulties due to the cursive nature of the scripts, overlapping characters, large number of dots and diacritics, etc. In this chapter, we present a general framework for a printed Arabic text recognition system. We then discuss different phases of such a system, e.g., pre-processing, feature extraction, and classification. We present different reported techniques for each phase. In addition, different databases for printed Arabic text recognition are discussed here. We conclude this chapter by presenting several experimental results for hidden Markov model (HMM)-based printed Arabic text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdel-Azim, H.Y., Hashish, M.A.: A hidden Markov modelling approach to the recognition of signatures: a feasibility study. In: Proceedings of the First Kuwait Computer Conference, March 1989, pp. 402–425 (1989)

    Google Scholar 

  2. Abdel-Azim, H.Y., Hashish, M.A.: Automatic recognition of handwritten Hindi numerals. In: Proceedings of the 11th Saudi National Computer Conference, March 1989, pp. 287–298 (1989)

    Google Scholar 

  3. Abdel-Azim, H.Y., Mousa, A.M., Saleh, Y.L., Hashish, M.A.: Arabic text recognition using a partial observation approach. In: Proceedings of the 12th National Computer Conference, October 1990, pp. 427–437 (1990)

    Google Scholar 

  4. Abdelraouf, A., Higgins, C.A., Khalil, M.: A database for Arabic printed character recognition. In: Proceedings of the 5th International Conference on Image Analysis and Recognition (ICIAR), pp. 567–578 (2008)

    Google Scholar 

  5. Akiyama, T., Hagita, N.: Automatic entry system for printed documents. Pattern Recognit. 23(11), 1141–1154 (1990)

    Article  Google Scholar 

  6. Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995)

    Article  MATH  Google Scholar 

  7. Al-Emami, S., Usher, M.: On-line recognition of handwritten Arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 704–710 (1990)

    Article  Google Scholar 

  8. Al-Hashim, A.G., Mahmoud, S.A.: Benchmark database and GUI environment for printed Arabic text recognition research. WSEAS Trans. Inf. Sci. Appl. 4(7), 587–597 (2010)

    Google Scholar 

  9. Al-Muhtaseb, H.A., Mahmoud, S.A., Qahwaji, R.: Recognition of off-line printed Arabic text using hidden Markov models. Signal Process. 88(12), 2902–2912 (2008)

    Article  MATH  Google Scholar 

  10. Al-Qaisy, E.K., Naser, H.L.: Using probabilistic functions for the recognition of handwritten Arabic numerals. In: Proc. of the First Kuwait Computer Conference (in Arabic), pp. 109–120 (1989)

    Google Scholar 

  11. Al-Tikriti, M.N., Al-Ramchi, S.K.: Fuzzy approach for some Arabic handwritten characters computer recognition. In: Proc. Comput. Processing Arabic Language Workshop Papers, pp. 1–14 (1985)

    Google Scholar 

  12. Al-Yousefi, H., Udpa, S.: Recognition of handwritten Arabic characters. In: Proc. of the SPIE 32nd Annual Technical Symposium on Optical and Opto-Electronics Applied Science and Engineering, San Diego, CA, USA, vol. 974, pp. 330–336 (1988)

    Google Scholar 

  13. Ali, S.A., Al-Saadoun, M.S.: A parallel algorithm for image thinning. In: Proc. of the First Kuwait Computer Conference (in Arabic), Kuwait, pp. 121–140 (1989)

    Google Scholar 

  14. Amin, A.: Arabic handwriting recognition and understanding. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait, vol. 1, pp. 1–37 (1985)

    Google Scholar 

  15. Amin, A.: OCR of Arabic texts. In: Proc. of the 9th Int. Conf. on Pattern Recognition, pp. 616–625. University of Cambridge, Cambridge (1988)

    Google Scholar 

  16. Amin, A., Al-Fedaghi, S.: Machine recognition of printed Arabic text utilizing natural language morphology. Int. J. Man-Mach. Stud. 35(6), 769–788 (1991)

    Article  Google Scholar 

  17. Amin, A., Mari, J.F.: Machine recognition and correction of printed Arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300–1306 (1989)

    Article  Google Scholar 

  18. Amin, A., Masini, G.: Machine recognition of multifont printed Arabic texts. In: Proc. of the 8th IEEE Int. Joint Conf. on Pattern Recognition, Paris, France, pp. 392–395 (1986)

    Google Scholar 

  19. Badie, K., Shimura, M.: Machine recognition of Arabic handprinted scripts. Trans. Inst. Electron. Commun. Eng. Jpn., Sect. E 65(2), 107–114 (1982)

    Google Scholar 

  20. Baird, H.S.: Calibration of document image defect models. In: Proceedings of the 2nd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 1–16 (1993)

    Google Scholar 

  21. Bazzi, I., LaPre, C., Makhoul, J., Raphael, C., Schwartz, R.: Omnifont and unlimited-vocabulary OCR for English and Arabic. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, August 1997, vol. 2, pp. 842–846 (1997)

    Google Scholar 

  22. Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)

    Article  Google Scholar 

  23. Ben Amor, N., Ben Amara, N.E.: A hybrid approach for multifont Arabic characters recognition. In: Proc. 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, pp. 194–198 (2006)

    Google Scholar 

  24. Bloomberg, D.S., Kopec, G.E.: Method and Apparatus for Identification and Correction of Document Skew (1993)

    Google Scholar 

  25. Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: Proc. SPIE, vol. 2422, 302–316 (1995)

    Google Scholar 

  26. Borghesi, P.: Digital image processing techniques for object recognition and experimental results. In: Proceedings of The Digital Signal Processing Conf., Florance, Italy, pp. 764–768 (1984)

    Google Scholar 

  27. Bozinovic, R., Srihari, S.: Off-line cursive script word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 11(1), 68–83 (1989)

    Article  Google Scholar 

  28. Bunke, H., Bengio, S., Vinciarelli, A.: Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26, 709–720 (2004)

    Article  Google Scholar 

  29. Davidson, R.B., Hopley, R.L.: Arabic and Persian OCR training and test data sets. In: Proc. SDIUT, Annapolis, MD, pp. 303–307 (1997)

    Google Scholar 

  30. Dehghan, M., Faeza, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognit. 34(5), 1057–1065 (2001)

    Article  MATH  Google Scholar 

  31. Dehghani, A., Shabani, F.: Off-line recognition of isolated Persian handwritten characters using multiple hidden Markov models. In: International Conference on Information Technology: Coding and Computing, April 2001, pp. 506–510 (2001)

    Chapter  Google Scholar 

  32. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  33. El-Dabi, S.S., Ramsis, R., Kamal, A.: Arabic character recognition system: a statistical approach for recognizing cursive typewritten text. Pattern Recognit. 23(5), 485–495 (1990)

    Article  Google Scholar 

  34. El-Khaly, F., Sid-Ahmed, M.A.: Machine recognition of optically captured machine printed Arabic text. Pattern Recognit. 23(11), 1207–1214 (1990)

    Article  Google Scholar 

  35. El-Sheikh, T.S.: Recognition of handwritten Arabic mathematical formulas. In: Proc. of the UK IT 1990 Conf., pp. 344–351. University of Southampton, Southampton (1990)

    Google Scholar 

  36. El-Sheikh, T.S., El-Taweel, S.G.: Real-time Arabic handwritten character recognition. In: Proc. of the Third Int. Conf. on Image Processing and Its Applications, Warwick, UK, pp. 212–216 (1989)

    Google Scholar 

  37. El-Sheikh, T.S., Guindi, R.M.: Automatic recognition of isolated Arabic characters. Signal Process. 14(2), 177–184 (1988)

    Article  Google Scholar 

  38. El-Sheikh, T.S., Guindi, R.M.: Computer recognition of Arabic cursive scripts. Pattern Recognit. 21(4), 293–302 (1988)

    Article  Google Scholar 

  39. El-Wakil, M.S., Shoukry, A.A.: On-line recognition of handwritten isolated Arabic characters. Pattern Recognit. 22(2), 97–105 (1989)

    Article  Google Scholar 

  40. Elarian, Y., Mahmoud, S.A.: An adaptive line segmentation algorithm (ALSA) for Arabic. In: Proc. International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’08), pp. 735–739 (2008)

    Google Scholar 

  41. Fakir, M., Sodeyama, C.: Recognition of Arabic printed scripts by dynamic programming matching method. IEICE Trans. Inf. Syst. E76-D(2), 235–242 (1993)

    Google Scholar 

  42. Farah, N., Ennaji, A., Khadir, T., Sellami, M.: Benefits of multi-classifier systems for Arabic handwritten words recognition. In: Proc. Intl Conf. Document Analysis and Recognition, pp. 222–226 (2005)

    Google Scholar 

  43. Fayek, M.B., Al-Basha, B.: A new hierarchical method for isolated typewritten Arabic character classification and recognition. In: Proc. of the 13th Nat. Computer Conf., Riyadh, Saudi Arabia, pp. 750–760 (1992)

    Google Scholar 

  44. Goraine, H., Usher, M., Al-Emami, S.: Off-line Arabic character recognition. Computer 25(7), 71–74 (1992)

    Article  Google Scholar 

  45. Govindan, V.K., Shivaprasad, A.P.: Character recognition—a review. Pattern Recognit. 23(7), 671–683 (1990)

    Article  Google Scholar 

  46. Graff, D., Chen, K., Kong, J., Maeda, K.: Arabic Gigaword, 2nd edn. Linguistic Data Consortium, University of Pennsylvania, Philadelphia (2006)

    Google Scholar 

  47. Hassin, A., Tang, X., Liu, J., Zhao, W.: Printed Arabic character recognition using HMM. J. Comput. Sci. Technol. 19(4), 538–543 (2004)

    Article  Google Scholar 

  48. Hu, J., Lim, S., Brown, M.: Writer independent on-line handwriting recognition using an HMM approach. Pattern Recognit. 33, 133–147 (2000)

    Article  Google Scholar 

  49. Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition—a survey. Int. J. Pattern Recognit. Artif. Intell. 5(1), 1–24 (1991)

    Article  Google Scholar 

  50. Ishitani, Y.: Document skew detection based on local region complexity. In: Proc. 2nd Internat. Conf. on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 49–52 (1993)

    Google Scholar 

  51. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)

    Article  Google Scholar 

  52. Jenkins, F., Kanai, J.: A keyword-indexed bibliography of character recognition and document analysis (revision 2.0). Technical Report TR-93-07, Information Science Research Institute, University of Nevada, Las Vegas, April 1993

    Google Scholar 

  53. Jonathan, J.H.: Document image skew detection: survey and annotated bibliography. In: Document Analysis Systems, pp. 40–64. World Scientific, Singapore (1998)

    Google Scholar 

  54. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)

    Article  Google Scholar 

  55. Kasturi, R., O’Gorman, L.: Document image analysis: a bibliography. Mach. Vis. Appl. 5, 231–243 (1992)

    Article  Google Scholar 

  56. Khella, F.: Analysis of hexagonally sampled images with application to Arabic cursive text recognition. Ph.D. thesis, University of Bradford, Bradford, England (1992)

    Google Scholar 

  57. Khella, F., Mahmoud, S.A.: Recognition of hexagonally sampled Arabic characters. Arab. J. Sci. Eng. 19(4A), 565–586 (1994)

    Google Scholar 

  58. Khorsheed, M.S.: Mono-font cursive Arabic text recognition using speech recognition system. In: Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, vol. 4109, pp. 755–763 (2006)

    Chapter  Google Scholar 

  59. Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007)

    Article  Google Scholar 

  60. Liolios, N., Fakotakis, N., Kokkinakis, G.: On the generalization of the form identification and skew detection problem. Pattern Recognit. 35, 253–264 (2002)

    Article  MATH  Google Scholar 

  61. Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)

    Article  Google Scholar 

  62. Mahmoud, S.A.: Arabic character recognition using Fourier descriptors and character contour encoding. Pattern Recognit. 27(6), 815–824 (1994)

    Article  Google Scholar 

  63. Mahmoud, S.A.: Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models. Signal Process. 27(6), 815–824 (2008)

    Google Scholar 

  64. Mantas, J.: An overview of character recognition methodologies. Pattern Recognit. 19(19), 425–430 (1986)

    Article  Google Scholar 

  65. McClelland, D.: OCR: teaching your Mac to read. Macworld November, 169–178 (1991)

    Google Scholar 

  66. Mohamed, M., Gader, P.: Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Trans. Pattern Anal. Mach. Intell. 18(5), 548–554 (1996)

    Article  Google Scholar 

  67. Mori, S., Yamamoto, K.Y., Yasuda, N.: Research on machine recognition of handprinted characters. IEEE Trans. Pattern Anal. Mach. Intell. 6(4), 386–405 (1984)

    Article  Google Scholar 

  68. Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1057 (1992)

    Article  Google Scholar 

  69. Nouboud, F., Plamondon, R.: On-line recognition of handprinted characters: survey and beta tests. Pattern Recognit. 23(9), 1031–1044 (1990)

    Article  Google Scholar 

  70. Nurul-Ula, A., Nouh, A.S.: Automatic recognition of Arabic characters using logic statements—part I: system description and pre-processing. J. King Saud Univ., Eng. Sci. 14(2), 343–353 (1988)

    Google Scholar 

  71. Parvez, M.T., Mahmoud, S.A.: Polygonal approximation of digital planar curves through adaptive optimizations. Pattern Recognit. Lett. 31(13), 1997–2005 (2010)

    Article  Google Scholar 

  72. Pechwitz, M., Märgner, V.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT-database. In: Proc. Seventh International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, August 2003, pp. 890–894 (2003)

    Chapter  Google Scholar 

  73. Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–82 (2000)

    Article  Google Scholar 

  74. Postl, W.: Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th Internat. Conf. on Pattern Recognition, Paris, France, pp. 687–689 (1986)

    Google Scholar 

  75. Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: Proc. 19th International Conference on Pattern Recognition (ICPR) (2008)

    Google Scholar 

  76. Ramsis, R., El Dabi, S.S., Kamel, A.: Arabic character recognition system. Report KSC027, IBM Kuwait Scientific Center, Kuwait (1988)

    Google Scholar 

  77. SaadAllah, S., Yacu, S.G.: Design of an Arabic character reading machine. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait (1985)

    Google Scholar 

  78. Safabakhsh, R., Adibi, P.: Nastaaligh handwritten word recognition using a continuous-density variable-duration HMM. Arab. J. Sci. Eng. 30, 95–118 (2005)

    Google Scholar 

  79. Sarfraz, M., Mahmoud, S.A., Rasheed, Z.: On skew estimation and correction of text. In: Proc. Computer Graphics, Imaging and Visualisation (CGIV), Bangkok, Thailand, pp. 308–313 (2007)

    Chapter  Google Scholar 

  80. Schlosser, S.: ERIM Arabic Document Database. http://documents.cfar.umd.edu/resources/database/ (1995). Environmental Research Institute of Michigan (ERIM)

  81. Simon, J.-C.: Off-line cursive word recognition. Proc. IEEE 80(7), 1150–1161 (1992)

    Article  Google Scholar 

  82. Slimane, F., Ingold, R., Alimi, A.M., Hennebert, J.: Duration models for Arabic text recognition using hidden Markov models. In: Proc. of the International Conferences on Computational Intelligence for Modelling, Control and Automation; Intelligent Agents, Web Technologies and Internet Commerce; and Innovation in Software Engineering, pp. 838–843 (2008)

    Google Scholar 

  83. Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: Proc. 10th International Conference on Document Analysis and Recognition, pp. 946–950 (2009)

    Chapter  Google Scholar 

  84. Smith, L.I.: A Tutorial on Principal Components Analysis. Cornell University, Ithaca (2002)

    Google Scholar 

  85. Stallings, W.: Approaches to Chinese character recognition. Pattern Recognit. 8, 87–98 (1976)

    Article  MATH  Google Scholar 

  86. Steinherz, T., Intrator, N., Rivlin, E.: Skew detection via principal components analysis. In: Proc. 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 153–156 (1999)

    Google Scholar 

  87. Suen, C.Y., Berthod, M., Mori, S.: Automatic recognition of handprinted characters—the state of the art. Proc. IEEE 68(4), 469–487 (1980)

    Article  Google Scholar 

  88. Tappert, C.C., Suen, C.Y., Wakahara, T.: The state of the art in on-line handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(8), 787–808 (1990)

    Article  Google Scholar 

  89. Tiana, Q., Zhang, P., Alexer, T., Kim, Y.: Survey: omnifont printed character recognition. In: Visual Communications and Image Processing, pp. 260–268 (1991)

    Google Scholar 

  90. Wakahara, T., Murase, H., Odaka, K.: On-line handwriting recognition. Proc. IEEE 80(7), 1181–1194 (1992)

    Article  Google Scholar 

  91. Welch, E.M.: Can you read this? OCR software. MacUser 9(8), 169–178 (1993)

    Google Scholar 

  92. Young, S., Jansen, J., Odell, J., Ollason, D., Woodland, P.: The HTK Book (HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irfan Ahmed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this chapter

Cite this chapter

Ahmed, I., Mahmoud, S.A., Parvez, M.T. (2012). Printed Arabic Text Recognition. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4072-6_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4071-9

  • Online ISBN: 978-1-4471-4072-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics