Abstract
Recognition of Indian language scripts is a challenging problem and work towards the development of a complete OCR system for Indian language scripts is still in its infancy. Complete OCR systems have recently been developed for Devanagari and Bangla scripts. However, research in the field of recognition of Gurmukhi script faces major problems mainly due to the unique characteristics of the script such as connectivity of characters on a headline, characters pointing in both horizontal and vertical directions, two or more characters in a word having intersecting minimum bounding rectangles along horizontal direction, existence of a large set of visually similar character pairs, multi-component characters, touching and broken characters, and horizontally overlapping text segments. This chapter addresses the problems in the various stages of the development of a complete OCR system for Gurmukhi script and discusses potential solutions. A multi-font Gurmukhi OCR for printed text with an accuracy rate exceeding 96% at the character level is presented. A combination of local and global structural features is used for the feature extraction process, aimed at capturing the geometrical and topological features of the characters. For classification, we have implemented a multi-stage classification scheme in which the binary tree and k-nearest neighbor classifiers have been used in a hierarchical fashion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
V. K. Govindan, A. P. Shivaprasad, Character Recognition-A Review, Pattern Recognition, Vol. 23, 1990, pp. 671–683.
S.N.S. Rajasekaran, B.L. Deekshatulu, Recognition of Printed Telugu Characters, Computer Graphics and Image Processing, Vol. 6, 1977, pp. 335–360.
G. Siromoney, R. Chandrasekaran, M. Chandrasekaran, Machine Recognition of Printed Tamil Characters, Pattern Recognition, Vol. 10, 1978, pp. 243–247.
R. M. K. Sinha, H. N. Mahabala, Machine Recognition of Devanagari Script, IEEE Trans on Systems, Man and Cybernetics, Vol. 9, 1979, pp. 435–449.
B. B. Chaudhuri, U. Pal, A Complete Printed Bangla OCR System, Pattern Recognition, Vol. 31, 1998, pp. 531–549.
V. Bansal, Integrating Knowledge Sources in Devanagri Text Recognition, Ph.D. thesis. IIT Kanpur, 1999.
H. Ma and D. Doermann, Adaptive Hindi OCR Using Generalized Hausdorff Image Comparison, ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003, pp. 193–218.
A. Negi, B. Chakravarthy and B. Krishna, An OCR system for Telugu, Proc. Of 6th Int. Conf. on Document Analysis and Recognition, 2001, pp. 1110–1114.
U. Pal and B.B. Chaudhuri, Indian Script Character Recognition: A Survey, Pattern Recognition, Vol. 37, 2004, pp. 1887–1899.
G. S. Lehal and C. Singh, A Complete Machine Printed Gurmukhi OCR System, Vivek,  Vol. 16, No. 3, 2006, pp. 10–17.
G. S. Lehal and R. Dhir, A Range Free Skew Detection Technique for Digitized  Gurmukhi Script Documents, Proceedings 5th International Conference of Document Analysis and Recognition, 1999, pp. 147-152.
W. H. Abdulla, A. O. M. Saleh and A. H. Morad, A Pre-processing Algorithm for Handwritten Character Recognition, Pattern Recognition Letters, Vol. 7, 1988, pp. 13–18.
G. S. Lehal and C. Singh, Text Segmentation of Machine Printed Gurmukhi Script, Document Recognition and Retrieval VIII, Paul B. Kantor, Daniel P. Lopresti, Jiangying Zhou, Editors, Proceedings SPIE, USA, Vol. 4307, 2001, pp. 223–231.
G. S. Lehal and C. Singh, A Post Processor for Gurmukhi OCR, SADHANA Academy Proceedings in Engineering Sciences, Vol. 27, Part 1, 2002, pp. 99–112.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag London Limited
About this chapter
Cite this chapter
Lehal, G.S. (2009). A Complete Machine-Printed Gurmukhi OCR System. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_3
Download citation
DOI: https://doi.org/10.1007/978-1-84800-330-9_3
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84800-329-3
Online ISBN: 978-1-84800-330-9
eBook Packages: Computer ScienceComputer Science (R0)