Abstract
Script identification in multi-lingual text images will help in improving the efficiency of many real life applications, such as sorting, transcription of multilingual documents and OCR. In this paper, we have presented a technique for identification of three scripts, namely, Devanagari, Gurmukhi and Roman. We have identified the script of text based on statistical features, namely, zoning features; diagonal features; intersection and open end points based features; peak extent based features and combinations of these features. For classification, we have used multiple classification techniques, namely, Support Vector Machine (SVM), k-Nearest Neighbour (k-NN), and Convolutional Neural Network (CNN). The proposed strategy using CNN attains an average identification rate of 93.64%, with 5-fold cross-validation, for these three scripts when isolated offline handwritten characters of these scripts were considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pal, U., Chaudhari, B.B.: Script line separation from Indian multi-script documents. In: Proceedings of International Conference on Documents Analysis and Recognition, pp. 406–409 (1999)
Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13), 945–954 (2002)
Pal, U., Sinha, S., Chaudhri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of International Conference on Documents Analysis and Recognition, pp. 880–884 (2003)
Spitz, A.L.: Determination of the script and language content of document images. IEEE Trans. Patt. Anal. Mach. Intell. 19(3), 235–245 (1997)
Peake, G.S., Tan, T.N.: Script and language identification from document images. In: Proceedings of Third Asian Conference on Computer Vision Hong Kong, vol. 2, pp. 97–104 (1997)
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Patt. Anal. Mach. Intell. 19(2), 176–181 (1997)
Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and language identification for handwritten document images. Int. J. Doc. Anal. Recogn. 2(2), 45–52 (1999)
Wood, S.L., Yao, X., Krishnamurthi, K., Dang, L.: Language identification for printed text independent of segmentation. In: Proceedings of International Conference on Image Processing, vol. 3, pp. 428–431 (1995)
Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proceedings of 5th International Conference on Documents Analysis and Recognition, pp. 406–409 (1999)
Dhanya, D., Ramakrishnan, A.G.: Script identification in printed bilingual documents. In: Proceedings of 5th International Workshop on Document Analysis and System, pp. 13–24 (2002)
Pal, U., Sinha, S., Chaudhuri, B.B.: Word-wise script identification from a document containing English, Devnagari and Telugu Text. In: Proceedings of National Conference on Document Analysis and Recognition, pp. 213–220 (2003)
Patil, S.B., Subbareddy, N.V.: Neural network based system for script identification in Indian documents. SADHANA 27(1), 83–97 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhardwaj, A., Jindal, S.R. (2017). Script Identification from Offline Handwritten Characters Using Combination of Features. In: Deep, K., et al. Proceedings of Sixth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 547. Springer, Singapore. https://doi.org/10.1007/978-981-10-3325-4_17
Download citation
DOI: https://doi.org/10.1007/978-981-10-3325-4_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3324-7
Online ISBN: 978-981-10-3325-4
eBook Packages: EngineeringEngineering (R0)