UHTelPCC: A Dataset for Telugu Printed Character Recognition

Kummari, Rakesh; Bhagvati, Chakravarthy

doi:10.1007/978-981-13-9187-3_3

Rakesh Kummari⁹ &
Chakravarthy Bhagvati⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1037))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

1010 Accesses
2 Citations

Abstract

This paper describes how UHTelPCC, a dataset for Telugu printed character recognition, is created and its characteristics. The dataset is created from characters extracted from images of printed Telugu texts from the period 1950–1990. Thus, it is hoped that the dataset provides the basis for developing practical Telugu OCR systems. UHTelPCC is to provide a standard benchmark for comparing different algorithms for Telugu OCR and helps in research and development of Telugu OCR systems. UHTelPCC contains 70K samples of 325 classes, and these samples are divided into 50K, 10K, 10K training, validation, and test sets respectively. It is hoped that UHTelPCC serves like MNIST, a dataset for handwritten digit recognition, for Telugu printed character recognition. The baseline performances on the test set using KNN, MLP, and CNN are 98.85%, 99.52%, and 99.68% respectively. UHTelPCC is available at http://scis.uohyd.ac.in/~chakcs/UHTelPCC.html.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achanta, R., Hastie, T.: Telugu OCR framework using deep learning. arXiv preprint arXiv:1509.05962 (2015)
Balm, G.: An introduction to optical character reader considerations. Pattern Recogn. 2(3), 151–166 (1970)
Article Google Scholar
Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)
Article Google Scholar
Dongre, V.J., Mankar, V.H.: Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012, 29 (2012)
Article Google Scholar
Fujisawa, H., Nakano, Y., Kurino, K.: Segmentation methods for character recognition: from segmentation to document structure analysis. Proc. IEEE 80(7), 1079–1092 (1992)
Article Google Scholar
Gonzalez, R.C., Woods, R.E., et al.: Digital Image Processing (2002)
Google Scholar
Govindan, V., Shivaprasad, A.: Character recognition - a review. Pattern Recogn. 23(7), 671–683 (1990)
Article Google Scholar
Govindaraju, V., Setlur, S.: Guide to OCR for Indic Scripts. Springer, London (2009)
Google Scholar
Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Handbook of Character Recognition and Document Image Analysis, pp. 779–799. World Scientific (1997)
Google Scholar
Hegadi, R.S., Kamble, P.M.: Recognition of Marathi handwritten numerals using multi-layer feed-forward neural network. In: 2014 World Congress on Computing and Communication Technologies (WCCCT), pp. 21–24. IEEE (2014)
Google Scholar
Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition: a survey. Int. J. Pattern Recogn. Artif. Intell. 5(01n02), 1–24 (1991)
Article Google Scholar
Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 782–796 (2011)
Article Google Scholar
John, J., Pramod, K., Balakrishnan, K.: Offline handwritten Malayalam character recognition based on chain code histogram. In: 2011 International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT), pp. 736–741. IEEE (2011)
Google Scholar
Kamble, P.M., Hegadi, R.S.: Handwritten Marathi character recognition using r-hog feature. Procedia Comput. Sci. 45, 266–274 (2015)
Article Google Scholar
Kamble, P.M., Hegadi, R.S.: Comparative study of handwritten Marathi characters recognition based on KNN and SVM classifier. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 93–101. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-4859-3_9
Chapter Google Scholar
Kannan, R.J., Prabhakar, R.: An Improved Handwritten Tamil Character Recognition System Using Octal Graph (2008)
Article Google Scholar
Kannan, R.J., Prabhakar, R., Suresh, R.: Off-line cursive handwritten Tamil character recognition. In: International Conference on Security Technology, 2008. SECTECH 2008, pp. 159–164. IEEE (2008)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Mantas, J.: An overview of character recognition methodologies. Pattern Recogn. 19(6), 425–430 (1986)
Article Google Scholar
Murthy, K.N.: Natural Language Processing: An Information Access Perspective. Ess Ess Publications for Sarada Ranganathan Endowment For Library Science (2006)
Google Scholar
Murthy, K.N., Srinivasu, B.: Roman transliteration of Indic scripts. In: 10th International Conference on Computer Applications, University of Computer Studies, Yangon, Myanmar, 28–29 February 2012 (2012)
Google Scholar
Negi, A., Bhagvati, C., Krishna, B.: An OCR system for Telugu. In: Sixth International Conference on Document Analysis and Recognition, 2001. Proceedings, pp. 1110–1114. IEEE (2001)
Google Scholar
Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. Pattern Recogn. 37(9), 1887–1899 (2004)
Article Google Scholar
Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in Indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 11(1), 1 (2012)
Article Google Scholar
Patel, A., Sukumar, B., Bhagvati, C.: SVM with inverse fringe as feature for improving accuracy of Telugu OCR systems. In: Sa, P.K., Sahoo, M.N., Murugappan, M., Wu, Y., Majhi, B. (eds.) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. AISC, vol. 518, pp. 253–263. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-3373-5_25
Chapter Google Scholar
Prakash, K.C., Srikar, Y., Trishal, G., Mandal, S., Channappayya, S.S.: Optical character recognition (ocr) for telugu: Database, algorithm and application. arXiv preprint arXiv:1711.07245 (2017)
Rajasekaran, S., Deekshatulu, B.: Recognition of printed Telugu characters. Computer Graph. Image Process. 6(4), 335–360 (1977)
Article Google Scholar
Santosh, K.C.: Character recognition based on DTW-radon. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 264–268. IEEE (2011)
Google Scholar
Santosh, K.C., Wendling, L.: Character recognition based on non-linear multi-projection profiles measure. Front. Comput. Sci. 9(5), 678–690 (2015)
Article Google Scholar
Singh, S.: Optical character recognition techniques: a survey. J. Emerg. Trends Comput. Inf. Sci. 4(6), 545–550 (2013)
Google Scholar
Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: 10th International Conference on Document Analysis and Recognition, 2009. ICDAR 2009, pp. 946–950. IEEE (2009)
Google Scholar
Srinivas, B.A., Agarwal, A., Rao, C.R.: An overview of OCR research in Indian scripts. IJCSES 2(2), 141–153 (2008)
Google Scholar
Trier, O.D., Jain, A.K., Taxt, T., et al.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)
Article Google Scholar

Download references

Acknowledgment

We thank Amit Patel for his efforts in labeling connected components. The first author acknowledges the financial support received from the Council of Scientific and Industrial Research (CSIR), Government of India in the form of a Junior Research Fellowship.

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, India
Rakesh Kummari & Chakravarthy Bhagvati

Authors

Rakesh Kummari
View author publications
You can also search for this author in PubMed Google Scholar
Chakravarthy Bhagvati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rakesh Kummari or Chakravarthy Bhagvati .

Editor information

Editors and Affiliations

Department of Computer Science, University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Solapur University, Solapur, India
Ravindra S. Hegadi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kummari, R., Bhagvati, C. (2019). UHTelPCC: A Dataset for Telugu Printed Character Recognition. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-9187-3_3
Published: 17 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9186-6
Online ISBN: 978-981-13-9187-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics