On Multifont Character Classification in Telugu

Rasagna, Venkat; Jinesh, K. J.; Jawahar, C. V.

doi:10.1007/978-3-642-19403-0_14

Venkat Rasagna²,
K. J. Jinesh² &
C. V. Jawahar²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 139))

Included in the following conference series:

International Conference on Information Systems for Indian Languages

702 Accesses
2 Citations

Abstract

A major requirement in the design of robust OCRs is the invariance of feature extraction scheme with the popular fonts used in the print. Many statistical and structural features have been tried for character classification in the past. In this paper, we get motivated by the recent successes in object category recognition literature and use a spatial extension of the histogram of oriented gradients (HOG) for character classification. Our experiments are conducted on 1453950 Telugu character samples in 359 classes and 15 fonts. On this data set, we obtain an accuracy of 96-98% with an SVM classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sankar, K.P., Ambati, V., Pratha, L., Jawahar, C.V.: Digitizing a million books: Challenges for document analysis. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 425–436. Springer, Heidelberg (2006)
Chapter Google Scholar
Neeba, N.V., Jawahar, C.V.: Empirical evaluation of character classification schemes. In: Seventh International Conference on Advances in Pattern Recognition (ICAPR), pp. 310–313. IEEE, Los Alamitos (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893. IEEE, Los Alamitos (2005)
Google Scholar
Negi, A., Bhagvati, C., Krishna, B.: An OCR system for telugu. In: Proceedings of Sixth International Conference on Document Analysis and Recognition (ICDAR), pp. 1110–1114. IEEE, Los Alamitos (2002)
Google Scholar
Jawahar, C.V., Kumar, P., Kiran, R., et al.: A blingual OCR for hindi-telugu documents and its applications. In: Proceedings of Seventh International Conference on Document Analysis and Recognition (ICDAR), pp. 408–412. IEEE, Los Alamitos (2003)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178. IEEE, Los Alamitos (2006)
Google Scholar
Maji, S., Malik, J.: Fast and accurate digit classification. Technical Report UCB/EECS-2009-159, EECS Department, University of California, Berkeley (2009)
Google Scholar
De Campos, T.E., Babu, B.R., Varma, M.: Character recognition in natural images. In: Proceedings of International Conference on Computer Vision Theory and Applications (VISAPP), INSTICC, pp. 273–280 (2009)
Google Scholar
Ilayaraja, P., Neeba, N.V., Jawahar, C.V.: Efficient implementation of SVM for large class problems. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4. IEEE, Los Alamitos (2009)
Google Scholar
Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, Los Alamitos (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, 500032, India
Venkat Rasagna, K. J. Jinesh & C. V. Jawahar

Authors

Venkat Rasagna
View author publications
You can also search for this author in PubMed Google Scholar
K. J. Jinesh
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Punjabi University, Patiala, India
Chandan Singh , Gurpreet Singh Lehal , Jyotsna Sengupta , Dharam Veer Sharma & Vishal Goyal , , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rasagna, V., Jinesh, K.J., Jawahar, C.V. (2011). On Multifont Character Classification in Telugu. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds) Information Systems for Indian Languages. ICISIL 2011. Communications in Computer and Information Science, vol 139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19403-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-19403-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19402-3
Online ISBN: 978-3-642-19403-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics