Advertisement

A Complete Machine-Printed Gurmukhi OCR System

  • G. S. LehalEmail author
Chapter
Part of the Advances in Pattern Recognition book series (ACVPR)

Abstract

Recognition of Indian language scripts is a challenging problem and work towards the development of a complete OCR system for Indian language scripts is still in its infancy. Complete OCR systems have recently been developed for Devanagari and Bangla scripts. However, research in the field of recognition of Gurmukhi script faces major problems mainly due to the unique characteristics of the script such as connectivity of characters on a headline, characters pointing in both horizontal and vertical directions, two or more characters in a word having intersecting minimum bounding rectangles along horizontal direction, existence of a large set of visually similar character pairs, multi-component characters, touching and broken characters, and horizontally overlapping text segments. This chapter addresses the problems in the various stages of the development of a complete OCR system for Gurmukhi script and discusses potential solutions. A multi-font Gurmukhi OCR for printed text with an accuracy rate exceeding 96% at the character level is presented. A combination of local and global structural features is used for the feature extraction process, aimed at capturing the geometrical and topological features of the characters. For classification, we have implemented a multi-stage classification scheme in which the binary tree and k-nearest neighbor classifiers have been used in a hierarchical fashion.

Keywords

OCR, Gurmukhi Segmentation Classification Post-processing 

References

  1. 1.
    V. K. Govindan, A. P. Shivaprasad, Character Recognition-A Review, Pattern Recognition, Vol. 23, 1990, pp. 671–683.CrossRefGoogle Scholar
  2. 2.
    S.N.S. Rajasekaran, B.L. Deekshatulu, Recognition of Printed Telugu Characters, Computer Graphics and Image Processing, Vol. 6, 1977, pp. 335–360.CrossRefGoogle Scholar
  3. 3.
    G. Siromoney, R. Chandrasekaran, M. Chandrasekaran, Machine Recognition of Printed Tamil Characters, Pattern Recognition, Vol. 10, 1978, pp. 243–247.zbMATHCrossRefGoogle Scholar
  4. 4.
    R. M. K. Sinha, H. N. Mahabala, Machine Recognition of Devanagari Script, IEEE Trans on Systems, Man and Cybernetics, Vol. 9, 1979, pp. 435–449.zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    B. B. Chaudhuri, U. Pal, A Complete Printed Bangla OCR System, Pattern Recognition, Vol. 31, 1998, pp. 531–549.CrossRefGoogle Scholar
  6. 6.
    V. Bansal, Integrating Knowledge Sources in Devanagri Text Recognition, Ph.D. thesis. IIT Kanpur, 1999.Google Scholar
  7. 7.
    H. Ma and D. Doermann, Adaptive Hindi OCR Using Generalized Hausdorff Image Comparison, ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003, pp. 193–218.CrossRefGoogle Scholar
  8. 8.
    A. Negi, B. Chakravarthy and B. Krishna, An OCR system for Telugu, Proc. Of 6th Int. Conf. on Document Analysis and Recognition, 2001, pp. 1110–1114.Google Scholar
  9. 9.
    U. Pal and B.B. Chaudhuri, Indian Script Character Recognition: A Survey, Pattern Recognition, Vol. 37, 2004, pp. 1887–1899.CrossRefGoogle Scholar
  10. 10.
    G. S. Lehal and C. Singh, A Complete Machine Printed Gurmukhi OCR System, Vivek,  Vol. 16, No. 3, 2006, pp. 10–17.Google Scholar
  11. 11.
    G. S. Lehal and R. Dhir, A Range Free Skew Detection Technique for Digitized  Gurmukhi Script Documents, Proceedings 5th International Conference of Document Analysis and Recognition, 1999, pp. 147-152.Google Scholar
  12. 12.
    W. H. Abdulla, A. O. M. Saleh and A. H. Morad, A Pre-processing Algorithm for Handwritten Character Recognition, Pattern Recognition Letters, Vol. 7, 1988, pp. 13–18.CrossRefGoogle Scholar
  13. 13.
    G. S. Lehal and C. Singh, Text Segmentation of Machine Printed Gurmukhi Script, Document Recognition and Retrieval VIII, Paul B. Kantor, Daniel P. Lopresti, Jiangying Zhou, Editors, Proceedings SPIE, USA, Vol. 4307, 2001, pp. 223–231.Google Scholar
  14. 14.
    G. S. Lehal and C. Singh, A Post Processor for Gurmukhi OCR, SADHANA Academy Proceedings in Engineering Sciences, Vol. 27, Part 1, 2002, pp. 99–112.Google Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Department of Computer SciencePunjabi UniversityPatialaIndia

Personalised recommendations