OCR of Printed Telugu Text with High Recognition Accuracies

Vasantha Lakshmi, C.; Jain, Ritu; Patvardhan, C.

doi:10.1007/11949619_70

C. Vasantha Lakshmi¹⁸,
Ritu Jain¹⁸ &
C. Patvardhan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4338))

1880 Accesses
8 Citations

Abstract

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research.

OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are identified as the unit of recognition in Telugu script. Edge Histograms are used for a feature based recognition scheme for these basic symbols. During recognition, it is observed that, in many cases, the recognizer incorrectly outputs a very similar looking symbol. Special logic and algorithms are developed using simple structural features for improving recognition accuracies considerably without too much additional computational effort. It is shown that recognition accuracies of 98.5 % can be achieved on laser quality prints with such a procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nagy, G.: Twenty years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 38–63 (2000)
Article Google Scholar
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR Research and Development. Proc. of the IEEE, 1029–1058 (1992)
Google Scholar
Govindan, V.K., Shivaprasad, A.P.: Character recognition – A review. Pattern Recognition 23(7), 671–683 (1990)
Article Google Scholar
Bansal, V., Sinha, R.M.K.: A survey of OCR in Indian Languages and a Devanagari OCR scheme. In: Proceedings of the STRANS – 2001, IIT, Kanpur (2001)
Google Scholar
Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR system. Pattern Recognition 31, 531–549 (1998)
Article Google Scholar
Nagabhushan, P., Radhika, A.: Improved region decomposition method for the recognition of non-uniform sized characters. In: Proceedings of the International Conference on Cognitive science, ICCS 1997, New Delhi, vol. 1, pp. 36–42 (1997)
Google Scholar
Anna Durai, S., et al.: Tamil character recognition using multilayer neural network. In: Indian Conference on Pattern Recognition. Image Processing and Computer Vision (ICPIC), pp. 155–160 (1995)
Google Scholar
Bishnu, A., Chaudhuri, B.: Segmentation of Bangla Handwritten text into characters by recursive contour following. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 402–405 (1999)
Google Scholar
Pal, U., Chaudhuri, B.: Script line separation from Indian Multi-Script Documents. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 406–409 (1999)
Google Scholar
Bansal, V., Sinha, R.: On how to describe shapes of Devanagari Characters and use them for Recognition. In: Proceedings of ICDAR 1999, pp. 410–413 (1999)
Google Scholar
Anatani, S., Agnihotri, L.: Gujarati Character Recognition. In: Proceedings of ICDAR 1999, pp. 418–421 (1999)
Google Scholar
Sundaresan, C., Keerthi, S.: A study of representation for Pen based Handwriting recognition of Tamil Characters. In: Proceedings of ICDAR 1999, pp. 422–425 (1999)
Google Scholar
Sukhaswami, M.B., Seetharamulu, Pujari, A.K.: Recognition of Telugu characters using Neural Networks. Int. Journal of Neural Systems 6(3), 317–357 (1995)
Article Google Scholar
Negi, A., et al.: An OCR system for Telugu. In: Proceedings of International Conference on Document Analysis and Recognition, ICDAR – 2001, Seattle, USA (2001)
Google Scholar
Vasantha Lakshmi, C., Patvardhan, C., Singh, R.: A novel basic symbol approach for Telugu OCR with neural networks. Journal of the Computer Society of India, 31–39 (March 2003)
Google Scholar
Vasantha Lakshmi, C., Patvardhan, C.: Recognition of basic symbols in Telugu by Neural networks. In: STRANS-2002, IIT Kanpur, March 15–17 (2002)
Google Scholar
Vasantha Lakshmi, C., Patvardhan, C.: An OCR system for Telugu text: A basic symbol approach. Int. Jl. on Pattern Analysis and Applications, 190–204 (July 2004)
Google Scholar
Vasantha Lakshmi, C.: Ph.D. Thesis, Dayalbagh Educational Institute, Agra, India (unpublished, 2003)
Google Scholar
Sonka, M., Hlavac, V., Boyle, R.: Image processing, Analysis, and Machine Vision, 2nd edn. Brooks/Cole Publishing Company (1998)
Google Scholar
Srikanthan, G., Lam, S.W., Srihari, S.N.: Gradient based contour encoding for character recognition. Pattern Recognition 29(7), 1147–1160 (1996)
Article Google Scholar
LEAP, Indian language software, CDAC, Pune, India
Google Scholar
Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7. John Wiley & Sons, Chichester (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dayalbagh Educational Institute, Agra, 282005, India
C. Vasantha Lakshmi, Ritu Jain & C. Patvardhan

Authors

C. Vasantha Lakshmi
View author publications
You can also search for this author in PubMed Google Scholar
Ritu Jain
View author publications
You can also search for this author in PubMed Google Scholar
C. Patvardhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, IIT Delhi, New Delhi, India
Prem K. Kalra
School of Computer Science and Engineering, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel
Shmuel Peleg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vasantha Lakshmi, C., Jain, R., Patvardhan, C. (2006). OCR of Printed Telugu Text with High Recognition Accuracies. In: Kalra, P.K., Peleg, S. (eds) Computer Vision, Graphics and Image Processing. Lecture Notes in Computer Science, vol 4338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11949619_70

Download citation

DOI: https://doi.org/10.1007/11949619_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68301-8
Online ISBN: 978-3-540-68302-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics