Printed Arabic Text Recognition

  • Irfan Ahmed
  • Sabri A. Mahmoud
  • Mohammed Tanvir Parvez

Abstract

This chapter addresses automatic printed Arabic text recognition. Arabic text recognition has its own difficulties due to the cursive nature of the scripts, overlapping characters, large number of dots and diacritics, etc. In this chapter, we present a general framework for a printed Arabic text recognition system. We then discuss different phases of such a system, e.g., pre-processing, feature extraction, and classification. We present different reported techniques for each phase. In addition, different databases for printed Arabic text recognition are discussed here. We conclude this chapter by presenting several experimental results for hidden Markov model (HMM)-based printed Arabic text recognition.

Keywords

Hexagonal 

References

  1. 1.
    Abdel-Azim, H.Y., Hashish, M.A.: A hidden Markov modelling approach to the recognition of signatures: a feasibility study. In: Proceedings of the First Kuwait Computer Conference, March 1989, pp. 402–425 (1989) Google Scholar
  2. 2.
    Abdel-Azim, H.Y., Hashish, M.A.: Automatic recognition of handwritten Hindi numerals. In: Proceedings of the 11th Saudi National Computer Conference, March 1989, pp. 287–298 (1989) Google Scholar
  3. 3.
    Abdel-Azim, H.Y., Mousa, A.M., Saleh, Y.L., Hashish, M.A.: Arabic text recognition using a partial observation approach. In: Proceedings of the 12th National Computer Conference, October 1990, pp. 427–437 (1990) Google Scholar
  4. 4.
    Abdelraouf, A., Higgins, C.A., Khalil, M.: A database for Arabic printed character recognition. In: Proceedings of the 5th International Conference on Image Analysis and Recognition (ICIAR), pp. 567–578 (2008) Google Scholar
  5. 5.
    Akiyama, T., Hagita, N.: Automatic entry system for printed documents. Pattern Recognit. 23(11), 1141–1154 (1990) CrossRefGoogle Scholar
  6. 6.
    Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995) MATHCrossRefGoogle Scholar
  7. 7.
    Al-Emami, S., Usher, M.: On-line recognition of handwritten Arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 704–710 (1990) CrossRefGoogle Scholar
  8. 8.
    Al-Hashim, A.G., Mahmoud, S.A.: Benchmark database and GUI environment for printed Arabic text recognition research. WSEAS Trans. Inf. Sci. Appl. 4(7), 587–597 (2010) Google Scholar
  9. 9.
    Al-Muhtaseb, H.A., Mahmoud, S.A., Qahwaji, R.: Recognition of off-line printed Arabic text using hidden Markov models. Signal Process. 88(12), 2902–2912 (2008) MATHCrossRefGoogle Scholar
  10. 10.
    Al-Qaisy, E.K., Naser, H.L.: Using probabilistic functions for the recognition of handwritten Arabic numerals. In: Proc. of the First Kuwait Computer Conference (in Arabic), pp. 109–120 (1989) Google Scholar
  11. 11.
    Al-Tikriti, M.N., Al-Ramchi, S.K.: Fuzzy approach for some Arabic handwritten characters computer recognition. In: Proc. Comput. Processing Arabic Language Workshop Papers, pp. 1–14 (1985) Google Scholar
  12. 12.
    Al-Yousefi, H., Udpa, S.: Recognition of handwritten Arabic characters. In: Proc. of the SPIE 32nd Annual Technical Symposium on Optical and Opto-Electronics Applied Science and Engineering, San Diego, CA, USA, vol. 974, pp. 330–336 (1988) Google Scholar
  13. 13.
    Ali, S.A., Al-Saadoun, M.S.: A parallel algorithm for image thinning. In: Proc. of the First Kuwait Computer Conference (in Arabic), Kuwait, pp. 121–140 (1989) Google Scholar
  14. 14.
    Amin, A.: Arabic handwriting recognition and understanding. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait, vol. 1, pp. 1–37 (1985) Google Scholar
  15. 15.
    Amin, A.: OCR of Arabic texts. In: Proc. of the 9th Int. Conf. on Pattern Recognition, pp. 616–625. University of Cambridge, Cambridge (1988) Google Scholar
  16. 16.
    Amin, A., Al-Fedaghi, S.: Machine recognition of printed Arabic text utilizing natural language morphology. Int. J. Man-Mach. Stud. 35(6), 769–788 (1991) CrossRefGoogle Scholar
  17. 17.
    Amin, A., Mari, J.F.: Machine recognition and correction of printed Arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300–1306 (1989) CrossRefGoogle Scholar
  18. 18.
    Amin, A., Masini, G.: Machine recognition of multifont printed Arabic texts. In: Proc. of the 8th IEEE Int. Joint Conf. on Pattern Recognition, Paris, France, pp. 392–395 (1986) Google Scholar
  19. 19.
    Badie, K., Shimura, M.: Machine recognition of Arabic handprinted scripts. Trans. Inst. Electron. Commun. Eng. Jpn., Sect. E 65(2), 107–114 (1982) Google Scholar
  20. 20.
    Baird, H.S.: Calibration of document image defect models. In: Proceedings of the 2nd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 1–16 (1993) Google Scholar
  21. 21.
    Bazzi, I., LaPre, C., Makhoul, J., Raphael, C., Schwartz, R.: Omnifont and unlimited-vocabulary OCR for English and Arabic. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, August 1997, vol. 2, pp. 842–846 (1997) Google Scholar
  22. 22.
    Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999) CrossRefGoogle Scholar
  23. 23.
    Ben Amor, N., Ben Amara, N.E.: A hybrid approach for multifont Arabic characters recognition. In: Proc. 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, pp. 194–198 (2006) Google Scholar
  24. 24.
    Bloomberg, D.S., Kopec, G.E.: Method and Apparatus for Identification and Correction of Document Skew (1993) Google Scholar
  25. 25.
    Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: Proc. SPIE, vol. 2422, 302–316 (1995) Google Scholar
  26. 26.
    Borghesi, P.: Digital image processing techniques for object recognition and experimental results. In: Proceedings of The Digital Signal Processing Conf., Florance, Italy, pp. 764–768 (1984) Google Scholar
  27. 27.
    Bozinovic, R., Srihari, S.: Off-line cursive script word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 11(1), 68–83 (1989) CrossRefGoogle Scholar
  28. 28.
    Bunke, H., Bengio, S., Vinciarelli, A.: Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26, 709–720 (2004) CrossRefGoogle Scholar
  29. 29.
    Davidson, R.B., Hopley, R.L.: Arabic and Persian OCR training and test data sets. In: Proc. SDIUT, Annapolis, MD, pp. 303–307 (1997) Google Scholar
  30. 30.
    Dehghan, M., Faeza, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognit. 34(5), 1057–1065 (2001) MATHCrossRefGoogle Scholar
  31. 31.
    Dehghani, A., Shabani, F.: Off-line recognition of isolated Persian handwritten characters using multiple hidden Markov models. In: International Conference on Information Technology: Coding and Computing, April 2001, pp. 506–510 (2001) CrossRefGoogle Scholar
  32. 32.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973) MATHGoogle Scholar
  33. 33.
    El-Dabi, S.S., Ramsis, R., Kamal, A.: Arabic character recognition system: a statistical approach for recognizing cursive typewritten text. Pattern Recognit. 23(5), 485–495 (1990) CrossRefGoogle Scholar
  34. 34.
    El-Khaly, F., Sid-Ahmed, M.A.: Machine recognition of optically captured machine printed Arabic text. Pattern Recognit. 23(11), 1207–1214 (1990) CrossRefGoogle Scholar
  35. 35.
    El-Sheikh, T.S.: Recognition of handwritten Arabic mathematical formulas. In: Proc. of the UK IT 1990 Conf., pp. 344–351. University of Southampton, Southampton (1990) Google Scholar
  36. 36.
    El-Sheikh, T.S., El-Taweel, S.G.: Real-time Arabic handwritten character recognition. In: Proc. of the Third Int. Conf. on Image Processing and Its Applications, Warwick, UK, pp. 212–216 (1989) Google Scholar
  37. 37.
    El-Sheikh, T.S., Guindi, R.M.: Automatic recognition of isolated Arabic characters. Signal Process. 14(2), 177–184 (1988) CrossRefGoogle Scholar
  38. 38.
    El-Sheikh, T.S., Guindi, R.M.: Computer recognition of Arabic cursive scripts. Pattern Recognit. 21(4), 293–302 (1988) CrossRefGoogle Scholar
  39. 39.
    El-Wakil, M.S., Shoukry, A.A.: On-line recognition of handwritten isolated Arabic characters. Pattern Recognit. 22(2), 97–105 (1989) CrossRefGoogle Scholar
  40. 40.
    Elarian, Y., Mahmoud, S.A.: An adaptive line segmentation algorithm (ALSA) for Arabic. In: Proc. International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’08), pp. 735–739 (2008) Google Scholar
  41. 41.
    Fakir, M., Sodeyama, C.: Recognition of Arabic printed scripts by dynamic programming matching method. IEICE Trans. Inf. Syst. E76-D(2), 235–242 (1993) Google Scholar
  42. 42.
    Farah, N., Ennaji, A., Khadir, T., Sellami, M.: Benefits of multi-classifier systems for Arabic handwritten words recognition. In: Proc. Intl Conf. Document Analysis and Recognition, pp. 222–226 (2005) Google Scholar
  43. 43.
    Fayek, M.B., Al-Basha, B.: A new hierarchical method for isolated typewritten Arabic character classification and recognition. In: Proc. of the 13th Nat. Computer Conf., Riyadh, Saudi Arabia, pp. 750–760 (1992) Google Scholar
  44. 44.
    Goraine, H., Usher, M., Al-Emami, S.: Off-line Arabic character recognition. Computer 25(7), 71–74 (1992) CrossRefGoogle Scholar
  45. 45.
    Govindan, V.K., Shivaprasad, A.P.: Character recognition—a review. Pattern Recognit. 23(7), 671–683 (1990) CrossRefGoogle Scholar
  46. 46.
    Graff, D., Chen, K., Kong, J., Maeda, K.: Arabic Gigaword, 2nd edn. Linguistic Data Consortium, University of Pennsylvania, Philadelphia (2006) Google Scholar
  47. 47.
    Hassin, A., Tang, X., Liu, J., Zhao, W.: Printed Arabic character recognition using HMM. J. Comput. Sci. Technol. 19(4), 538–543 (2004) CrossRefGoogle Scholar
  48. 48.
    Hu, J., Lim, S., Brown, M.: Writer independent on-line handwriting recognition using an HMM approach. Pattern Recognit. 33, 133–147 (2000) CrossRefGoogle Scholar
  49. 49.
    Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition—a survey. Int. J. Pattern Recognit. Artif. Intell. 5(1), 1–24 (1991) CrossRefGoogle Scholar
  50. 50.
    Ishitani, Y.: Document skew detection based on local region complexity. In: Proc. 2nd Internat. Conf. on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 49–52 (1993) Google Scholar
  51. 51.
    Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000) CrossRefGoogle Scholar
  52. 52.
    Jenkins, F., Kanai, J.: A keyword-indexed bibliography of character recognition and document analysis (revision 2.0). Technical Report TR-93-07, Information Science Research Institute, University of Nevada, Las Vegas, April 1993 Google Scholar
  53. 53.
    Jonathan, J.H.: Document image skew detection: survey and annotated bibliography. In: Document Analysis Systems, pp. 40–64. World Scientific, Singapore (1998) Google Scholar
  54. 54.
    Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004) CrossRefGoogle Scholar
  55. 55.
    Kasturi, R., O’Gorman, L.: Document image analysis: a bibliography. Mach. Vis. Appl. 5, 231–243 (1992) CrossRefGoogle Scholar
  56. 56.
    Khella, F.: Analysis of hexagonally sampled images with application to Arabic cursive text recognition. Ph.D. thesis, University of Bradford, Bradford, England (1992) Google Scholar
  57. 57.
    Khella, F., Mahmoud, S.A.: Recognition of hexagonally sampled Arabic characters. Arab. J. Sci. Eng. 19(4A), 565–586 (1994) Google Scholar
  58. 58.
    Khorsheed, M.S.: Mono-font cursive Arabic text recognition using speech recognition system. In: Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, vol. 4109, pp. 755–763 (2006) CrossRefGoogle Scholar
  59. 59.
    Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007) CrossRefGoogle Scholar
  60. 60.
    Liolios, N., Fakotakis, N., Kokkinakis, G.: On the generalization of the form identification and skew detection problem. Pattern Recognit. 35, 253–264 (2002) MATHCrossRefGoogle Scholar
  61. 61.
    Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004) CrossRefGoogle Scholar
  62. 62.
    Mahmoud, S.A.: Arabic character recognition using Fourier descriptors and character contour encoding. Pattern Recognit. 27(6), 815–824 (1994) CrossRefGoogle Scholar
  63. 63.
    Mahmoud, S.A.: Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models. Signal Process. 27(6), 815–824 (2008) Google Scholar
  64. 64.
    Mantas, J.: An overview of character recognition methodologies. Pattern Recognit. 19(19), 425–430 (1986) CrossRefGoogle Scholar
  65. 65.
    McClelland, D.: OCR: teaching your Mac to read. Macworld November, 169–178 (1991) Google Scholar
  66. 66.
    Mohamed, M., Gader, P.: Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Trans. Pattern Anal. Mach. Intell. 18(5), 548–554 (1996) CrossRefGoogle Scholar
  67. 67.
    Mori, S., Yamamoto, K.Y., Yasuda, N.: Research on machine recognition of handprinted characters. IEEE Trans. Pattern Anal. Mach. Intell. 6(4), 386–405 (1984) CrossRefGoogle Scholar
  68. 68.
    Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1057 (1992) CrossRefGoogle Scholar
  69. 69.
    Nouboud, F., Plamondon, R.: On-line recognition of handprinted characters: survey and beta tests. Pattern Recognit. 23(9), 1031–1044 (1990) CrossRefGoogle Scholar
  70. 70.
    Nurul-Ula, A., Nouh, A.S.: Automatic recognition of Arabic characters using logic statements—part I: system description and pre-processing. J. King Saud Univ., Eng. Sci. 14(2), 343–353 (1988) Google Scholar
  71. 71.
    Parvez, M.T., Mahmoud, S.A.: Polygonal approximation of digital planar curves through adaptive optimizations. Pattern Recognit. Lett. 31(13), 1997–2005 (2010) CrossRefGoogle Scholar
  72. 72.
    Pechwitz, M., Märgner, V.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT-database. In: Proc. Seventh International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, August 2003, pp. 890–894 (2003) CrossRefGoogle Scholar
  73. 73.
    Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–82 (2000) CrossRefGoogle Scholar
  74. 74.
    Postl, W.: Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th Internat. Conf. on Pattern Recognition, Paris, France, pp. 687–689 (1986) Google Scholar
  75. 75.
    Prasad, R., Saleem, S., Kamali, M., Meermeier, R., Natarajan, P.: Improvements in hidden Markov model based Arabic OCR. In: Proc. 19th International Conference on Pattern Recognition (ICPR) (2008) Google Scholar
  76. 76.
    Ramsis, R., El Dabi, S.S., Kamel, A.: Arabic character recognition system. Report KSC027, IBM Kuwait Scientific Center, Kuwait (1988) Google Scholar
  77. 77.
    SaadAllah, S., Yacu, S.G.: Design of an Arabic character reading machine. In: Proc. of Computer Processing and Transmission of the Arabic Language Workshop, Kuwait (1985) Google Scholar
  78. 78.
    Safabakhsh, R., Adibi, P.: Nastaaligh handwritten word recognition using a continuous-density variable-duration HMM. Arab. J. Sci. Eng. 30, 95–118 (2005) Google Scholar
  79. 79.
    Sarfraz, M., Mahmoud, S.A., Rasheed, Z.: On skew estimation and correction of text. In: Proc. Computer Graphics, Imaging and Visualisation (CGIV), Bangkok, Thailand, pp. 308–313 (2007) CrossRefGoogle Scholar
  80. 80.
    Schlosser, S.: ERIM Arabic Document Database. http://documents.cfar.umd.edu/resources/database/ (1995). Environmental Research Institute of Michigan (ERIM)
  81. 81.
    Simon, J.-C.: Off-line cursive word recognition. Proc. IEEE 80(7), 1150–1161 (1992) CrossRefGoogle Scholar
  82. 82.
    Slimane, F., Ingold, R., Alimi, A.M., Hennebert, J.: Duration models for Arabic text recognition using hidden Markov models. In: Proc. of the International Conferences on Computational Intelligence for Modelling, Control and Automation; Intelligent Agents, Web Technologies and Internet Commerce; and Innovation in Software Engineering, pp. 838–843 (2008) Google Scholar
  83. 83.
    Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: Proc. 10th International Conference on Document Analysis and Recognition, pp. 946–950 (2009) CrossRefGoogle Scholar
  84. 84.
    Smith, L.I.: A Tutorial on Principal Components Analysis. Cornell University, Ithaca (2002) Google Scholar
  85. 85.
    Stallings, W.: Approaches to Chinese character recognition. Pattern Recognit. 8, 87–98 (1976) MATHCrossRefGoogle Scholar
  86. 86.
    Steinherz, T., Intrator, N., Rivlin, E.: Skew detection via principal components analysis. In: Proc. 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 153–156 (1999) Google Scholar
  87. 87.
    Suen, C.Y., Berthod, M., Mori, S.: Automatic recognition of handprinted characters—the state of the art. Proc. IEEE 68(4), 469–487 (1980) CrossRefGoogle Scholar
  88. 88.
    Tappert, C.C., Suen, C.Y., Wakahara, T.: The state of the art in on-line handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(8), 787–808 (1990) CrossRefGoogle Scholar
  89. 89.
    Tiana, Q., Zhang, P., Alexer, T., Kim, Y.: Survey: omnifont printed character recognition. In: Visual Communications and Image Processing, pp. 260–268 (1991) Google Scholar
  90. 90.
    Wakahara, T., Murase, H., Odaka, K.: On-line handwriting recognition. Proc. IEEE 80(7), 1181–1194 (1992) CrossRefGoogle Scholar
  91. 91.
    Welch, E.M.: Can you read this? OCR software. MacUser 9(8), 169–178 (1993) Google Scholar
  92. 92.
    Young, S., Jansen, J., Odell, J., Ollason, D., Woodland, P.: The HTK Book (HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006) Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Irfan Ahmed
    • 1
  • Sabri A. Mahmoud
    • 1
  • Mohammed Tanvir Parvez
    • 1
  1. 1.Information and Computer ScienceKing Fahd University of Petroleum and MineralsDhahranSaudi Arabia

Personalised recommendations