Skip to main content

A Complete Machine-Printed Gurmukhi OCR System

  • Chapter
  • First Online:
Guide to OCR for Indic Scripts

Part of the book series: Advances in Pattern Recognition ((ACVPR))

Abstract

Recognition of Indian language scripts is a challenging problem and work towards the development of a complete OCR system for Indian language scripts is still in its infancy. Complete OCR systems have recently been developed for Devanagari and Bangla scripts. However, research in the field of recognition of Gurmukhi script faces major problems mainly due to the unique characteristics of the script such as connectivity of characters on a headline, characters pointing in both horizontal and vertical directions, two or more characters in a word having intersecting minimum bounding rectangles along horizontal direction, existence of a large set of visually similar character pairs, multi-component characters, touching and broken characters, and horizontally overlapping text segments. This chapter addresses the problems in the various stages of the development of a complete OCR system for Gurmukhi script and discusses potential solutions. A multi-font Gurmukhi OCR for printed text with an accuracy rate exceeding 96% at the character level is presented. A combination of local and global structural features is used for the feature extraction process, aimed at capturing the geometrical and topological features of the characters. For classification, we have implemented a multi-stage classification scheme in which the binary tree and k-nearest neighbor classifiers have been used in a hierarchical fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. V. K. Govindan, A. P. Shivaprasad, Character Recognition-A Review, Pattern Recognition, Vol. 23, 1990, pp. 671–683.

    Article  Google Scholar 

  2. S.N.S. Rajasekaran, B.L. Deekshatulu, Recognition of Printed Telugu Characters, Computer Graphics and Image Processing, Vol. 6, 1977, pp. 335–360.

    Article  Google Scholar 

  3. G. Siromoney, R. Chandrasekaran, M. Chandrasekaran, Machine Recognition of Printed Tamil Characters, Pattern Recognition, Vol. 10, 1978, pp. 243–247.

    Article  MATH  Google Scholar 

  4. R. M. K. Sinha, H. N. Mahabala, Machine Recognition of Devanagari Script, IEEE Trans on Systems, Man and Cybernetics, Vol. 9, 1979, pp. 435–449.

    Article  MATH  MathSciNet  Google Scholar 

  5. B. B. Chaudhuri, U. Pal, A Complete Printed Bangla OCR System, Pattern Recognition, Vol. 31, 1998, pp. 531–549.

    Article  Google Scholar 

  6. V. Bansal, Integrating Knowledge Sources in Devanagri Text Recognition, Ph.D. thesis. IIT Kanpur, 1999.

    Google Scholar 

  7. H. Ma and D. Doermann, Adaptive Hindi OCR Using Generalized Hausdorff Image Comparison, ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003, pp. 193–218.

    Article  Google Scholar 

  8. A. Negi, B. Chakravarthy and B. Krishna, An OCR system for Telugu, Proc. Of 6th Int. Conf. on Document Analysis and Recognition, 2001, pp. 1110–1114.

    Google Scholar 

  9. U. Pal and B.B. Chaudhuri, Indian Script Character Recognition: A Survey, Pattern Recognition, Vol. 37, 2004, pp. 1887–1899.

    Article  Google Scholar 

  10. G. S. Lehal and C. Singh, A Complete Machine Printed Gurmukhi OCR System, Vivek,  Vol. 16, No. 3, 2006, pp. 10–17.

    Google Scholar 

  11. G. S. Lehal and R. Dhir, A Range Free Skew Detection Technique for Digitized  Gurmukhi Script Documents, Proceedings 5th International Conference of Document Analysis and Recognition, 1999, pp. 147-152.

    Google Scholar 

  12. W. H. Abdulla, A. O. M. Saleh and A. H. Morad, A Pre-processing Algorithm for Handwritten Character Recognition, Pattern Recognition Letters, Vol. 7, 1988, pp. 13–18.

    Article  Google Scholar 

  13. G. S. Lehal and C. Singh, Text Segmentation of Machine Printed Gurmukhi Script, Document Recognition and Retrieval VIII, Paul B. Kantor, Daniel P. Lopresti, Jiangying Zhou, Editors, Proceedings SPIE, USA, Vol. 4307, 2001, pp. 223–231.

    Google Scholar 

  14. G. S. Lehal and C. Singh, A Post Processor for Gurmukhi OCR, SADHANA Academy Proceedings in Engineering Sciences, Vol. 27, Part 1, 2002, pp. 99–112.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. S. Lehal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Lehal, G.S. (2009). A Complete Machine-Printed Gurmukhi OCR System. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-330-9_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-329-3

  • Online ISBN: 978-1-84800-330-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics