Advertisement

OCR of Printed Telugu Text with High Recognition Accuracies

  • C. Vasantha Lakshmi
  • Ritu Jain
  • C. Patvardhan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4338)

Abstract

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research.

OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are identified as the unit of recognition in Telugu script. Edge Histograms are used for a feature based recognition scheme for these basic symbols. During recognition, it is observed that, in many cases, the recognizer incorrectly outputs a very similar looking symbol. Special logic and algorithms are developed using simple structural features for improving recognition accuracies considerably without too much additional computational effort. It is shown that recognition accuracies of 98.5 % can be achieved on laser quality prints with such a procedure.

Keywords

Recognition Accuracy Near Neighbour Optical Character Recognition Foreground Pixel Zero Crossing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nagy, G.: Twenty years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 38–63 (2000)CrossRefGoogle Scholar
  2. 2.
    Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR Research and Development. Proc. of the IEEE, 1029–1058 (1992)Google Scholar
  3. 3.
    Govindan, V.K., Shivaprasad, A.P.: Character recognition – A review. Pattern Recognition 23(7), 671–683 (1990)CrossRefGoogle Scholar
  4. 4.
    Bansal, V., Sinha, R.M.K.: A survey of OCR in Indian Languages and a Devanagari OCR scheme. In: Proceedings of the STRANS – 2001, IIT, Kanpur (2001)Google Scholar
  5. 5.
    Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR system. Pattern Recognition 31, 531–549 (1998)CrossRefGoogle Scholar
  6. 6.
    Nagabhushan, P., Radhika, A.: Improved region decomposition method for the recognition of non-uniform sized characters. In: Proceedings of the International Conference on Cognitive science, ICCS 1997, New Delhi, vol. 1, pp. 36–42 (1997)Google Scholar
  7. 7.
    Anna Durai, S., et al.: Tamil character recognition using multilayer neural network. In: Indian Conference on Pattern Recognition. Image Processing and Computer Vision (ICPIC), pp. 155–160 (1995)Google Scholar
  8. 8.
    Bishnu, A., Chaudhuri, B.: Segmentation of Bangla Handwritten text into characters by recursive contour following. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 402–405 (1999)Google Scholar
  9. 9.
    Pal, U., Chaudhuri, B.: Script line separation from Indian Multi-Script Documents. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 1999, pp. 406–409 (1999)Google Scholar
  10. 10.
    Bansal, V., Sinha, R.: On how to describe shapes of Devanagari Characters and use them for Recognition. In: Proceedings of ICDAR 1999, pp. 410–413 (1999)Google Scholar
  11. 11.
    Anatani, S., Agnihotri, L.: Gujarati Character Recognition. In: Proceedings of ICDAR 1999, pp. 418–421 (1999)Google Scholar
  12. 12.
    Sundaresan, C., Keerthi, S.: A study of representation for Pen based Handwriting recognition of Tamil Characters. In: Proceedings of ICDAR 1999, pp. 422–425 (1999)Google Scholar
  13. 13.
    Sukhaswami, M.B., Seetharamulu, Pujari, A.K.: Recognition of Telugu characters using Neural Networks. Int. Journal of Neural Systems 6(3), 317–357 (1995)CrossRefGoogle Scholar
  14. 14.
    Negi, A., et al.: An OCR system for Telugu. In: Proceedings of International Conference on Document Analysis and Recognition, ICDAR – 2001, Seattle, USA (2001)Google Scholar
  15. 15.
    Vasantha Lakshmi, C., Patvardhan, C., Singh, R.: A novel basic symbol approach for Telugu OCR with neural networks. Journal of the Computer Society of India, 31–39 (March 2003)Google Scholar
  16. 16.
    Vasantha Lakshmi, C., Patvardhan, C.: Recognition of basic symbols in Telugu by Neural networks. In: STRANS-2002, IIT Kanpur, March 15–17 (2002)Google Scholar
  17. 17.
    Vasantha Lakshmi, C., Patvardhan, C.: An OCR system for Telugu text: A basic symbol approach. Int. Jl. on Pattern Analysis and Applications, 190–204 (July 2004)Google Scholar
  18. 18.
    Vasantha Lakshmi, C.: Ph.D. Thesis, Dayalbagh Educational Institute, Agra, India (unpublished, 2003)Google Scholar
  19. 19.
    Sonka, M., Hlavac, V., Boyle, R.: Image processing, Analysis, and Machine Vision, 2nd edn. Brooks/Cole Publishing Company (1998)Google Scholar
  20. 20.
    Srikanthan, G., Lam, S.W., Srihari, S.N.: Gradient based contour encoding for character recognition. Pattern Recognition 29(7), 1147–1160 (1996)CrossRefGoogle Scholar
  21. 21.
    LEAP, Indian language software, CDAC, Pune, IndiaGoogle Scholar
  22. 22.
    Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7. John Wiley & Sons, Chichester (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • C. Vasantha Lakshmi
    • 1
  • Ritu Jain
    • 1
  • C. Patvardhan
    • 1
  1. 1.Dayalbagh Educational InstituteAgraIndia

Personalised recommendations