Pattern Analysis & Applications

, Volume 5, Issue 1, pp 31–45

Off-Line Arabic Character Recognition – A Review

Authors

  • M. S. Khorsheed
    • Computer Laboratory, University of Cambridge, Cambridge, UK
Article

DOI: 10.1007/s100440200004

Cite this article as:
Khorsheed, M. Pattern Anal Appl (2002) 5: 31. doi:10.1007/s100440200004

Abstract:

Off-line recognition requires transferring the text under consideration into an image file. This represents the only available solution to bring the printed materials to the electronic media. However, the transferring process causes the system to lose the temporal information of that text. Other complexities that an off-line recognition system has to deal with are the lower resolution of the document and the poor binarisation, which can contribute to readability when essential features of the characters are deleted or obscured. Recognising Arabic script presents two additional challenges: orthography is cursive and letter shape is context sensitive. Certain character combinations form new ligature shapes, which are often font-dependent. Some ligatures involve vertical stacking of characters. Since not all letters connect, word boundary location becomes an interesting problem, as spacing may separate not only words, but also certain characters within a word. Various techniques have been implemented to achieve high recognition rates. These techniques have tackled different aspects of the recognition system. This review is organised into five major sections, covering a general overview, Arabic writing characteristics, Arabic text recognition system, Arabic OCR software and conclusions.

Key words

Arabic OCRFeature extractionFourier TransformHidden Markov ModelsHorizontal projectionHough TransformNeural NetworksOff-line recognitionPreprocessing segmentationVertical projection

Copyright information

© Springer-Verlag London Limited 2002