Skip to main content

UHTelPCC: A Dataset for Telugu Printed Character Recognition

  • Conference paper
  • First Online:
Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2018)

Abstract

This paper describes how UHTelPCC, a dataset for Telugu printed character recognition, is created and its characteristics. The dataset is created from characters extracted from images of printed Telugu texts from the period 1950–1990. Thus, it is hoped that the dataset provides the basis for developing practical Telugu OCR systems. UHTelPCC is to provide a standard benchmark for comparing different algorithms for Telugu OCR and helps in research and development of Telugu OCR systems. UHTelPCC contains 70K samples of 325 classes, and these samples are divided into 50K, 10K, 10K training, validation, and test sets respectively. It is hoped that UHTelPCC serves like MNIST, a dataset for handwritten digit recognition, for Telugu printed character recognition. The baseline performances on the test set using KNN, MLP, and CNN are 98.85%, 99.52%, and 99.68% respectively. UHTelPCC is available at http://scis.uohyd.ac.in/~chakcs/UHTelPCC.html.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achanta, R., Hastie, T.: Telugu OCR framework using deep learning. arXiv preprint arXiv:1509.05962 (2015)

  2. Balm, G.: An introduction to optical character reader considerations. Pattern Recogn. 2(3), 151–166 (1970)

    Article  Google Scholar 

  3. Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)

    Article  Google Scholar 

  4. Dongre, V.J., Mankar, V.H.: Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012, 29 (2012)

    Article  Google Scholar 

  5. Fujisawa, H., Nakano, Y., Kurino, K.: Segmentation methods for character recognition: from segmentation to document structure analysis. Proc. IEEE 80(7), 1079–1092 (1992)

    Article  Google Scholar 

  6. Gonzalez, R.C., Woods, R.E., et al.: Digital Image Processing (2002)

    Google Scholar 

  7. Govindan, V., Shivaprasad, A.: Character recognition - a review. Pattern Recogn. 23(7), 671–683 (1990)

    Article  Google Scholar 

  8. Govindaraju, V., Setlur, S.: Guide to OCR for Indic Scripts. Springer, London (2009)

    Google Scholar 

  9. Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Handbook of Character Recognition and Document Image Analysis, pp. 779–799. World Scientific (1997)

    Google Scholar 

  10. Hegadi, R.S., Kamble, P.M.: Recognition of Marathi handwritten numerals using multi-layer feed-forward neural network. In: 2014 World Congress on Computing and Communication Technologies (WCCCT), pp. 21–24. IEEE (2014)

    Google Scholar 

  11. Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition: a survey. Int. J. Pattern Recogn. Artif. Intell. 5(01n02), 1–24 (1991)

    Article  Google Scholar 

  12. Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 782–796 (2011)

    Article  Google Scholar 

  13. John, J., Pramod, K., Balakrishnan, K.: Offline handwritten Malayalam character recognition based on chain code histogram. In: 2011 International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT), pp. 736–741. IEEE (2011)

    Google Scholar 

  14. Kamble, P.M., Hegadi, R.S.: Handwritten Marathi character recognition using r-hog feature. Procedia Comput. Sci. 45, 266–274 (2015)

    Article  Google Scholar 

  15. Kamble, P.M., Hegadi, R.S.: Comparative study of handwritten Marathi characters recognition based on KNN and SVM classifier. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 93–101. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-4859-3_9

    Chapter  Google Scholar 

  16. Kannan, R.J., Prabhakar, R.: An Improved Handwritten Tamil Character Recognition System Using Octal Graph (2008)

    Article  Google Scholar 

  17. Kannan, R.J., Prabhakar, R., Suresh, R.: Off-line cursive handwritten Tamil character recognition. In: International Conference on Security Technology, 2008. SECTECH 2008, pp. 159–164. IEEE (2008)

    Google Scholar 

  18. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  19. Mantas, J.: An overview of character recognition methodologies. Pattern Recogn. 19(6), 425–430 (1986)

    Article  Google Scholar 

  20. Murthy, K.N.: Natural Language Processing: An Information Access Perspective. Ess Ess Publications for Sarada Ranganathan Endowment For Library Science (2006)

    Google Scholar 

  21. Murthy, K.N., Srinivasu, B.: Roman transliteration of Indic scripts. In: 10th International Conference on Computer Applications, University of Computer Studies, Yangon, Myanmar, 28–29 February 2012 (2012)

    Google Scholar 

  22. Negi, A., Bhagvati, C., Krishna, B.: An OCR system for Telugu. In: Sixth International Conference on Document Analysis and Recognition, 2001. Proceedings, pp. 1110–1114. IEEE (2001)

    Google Scholar 

  23. Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. Pattern Recogn. 37(9), 1887–1899 (2004)

    Article  Google Scholar 

  24. Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in Indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1 (2012)

    Article  Google Scholar 

  25. Patel, A., Sukumar, B., Bhagvati, C.: SVM with inverse fringe as feature for improving accuracy of Telugu OCR systems. In: Sa, P.K., Sahoo, M.N., Murugappan, M., Wu, Y., Majhi, B. (eds.) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. AISC, vol. 518, pp. 253–263. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-3373-5_25

    Chapter  Google Scholar 

  26. Prakash, K.C., Srikar, Y., Trishal, G., Mandal, S., Channappayya, S.S.: Optical character recognition (ocr) for telugu: Database, algorithm and application. arXiv preprint arXiv:1711.07245 (2017)

  27. Rajasekaran, S., Deekshatulu, B.: Recognition of printed Telugu characters. Computer Graph. Image Process. 6(4), 335–360 (1977)

    Article  Google Scholar 

  28. Santosh, K.C.: Character recognition based on DTW-radon. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 264–268. IEEE (2011)

    Google Scholar 

  29. Santosh, K.C., Wendling, L.: Character recognition based on non-linear multi-projection profiles measure. Front. Comput. Sci. 9(5), 678–690 (2015)

    Article  Google Scholar 

  30. Singh, S.: Optical character recognition techniques: a survey. J. Emerg. Trends Comput. Inf. Sci. 4(6), 545–550 (2013)

    Google Scholar 

  31. Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: 10th International Conference on Document Analysis and Recognition, 2009. ICDAR 2009, pp. 946–950. IEEE (2009)

    Google Scholar 

  32. Srinivas, B.A., Agarwal, A., Rao, C.R.: An overview of OCR research in Indian scripts. IJCSES 2(2), 141–153 (2008)

    Google Scholar 

  33. Trier, O.D., Jain, A.K., Taxt, T., et al.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)

    Article  Google Scholar 

Download references

Acknowledgment

We thank Amit Patel for his efforts in labeling connected components. The first author acknowledges the financial support received from the Council of Scientific and Industrial Research (CSIR), Government of India in the form of a Junior Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rakesh Kummari or Chakravarthy Bhagvati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kummari, R., Bhagvati, C. (2019). UHTelPCC: A Dataset for Telugu Printed Character Recognition. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9187-3_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9186-6

  • Online ISBN: 978-981-13-9187-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics