Abstract
The paper deals with the problem of the script discrimination in old Slavic printed documents. Therefore, an algorithm for script classification and identification is proposed. It creates coded text from initial document. Then, the coded text is subjected to statistical analysis. As a result, the texture feature extraction is carried out. Obtained texture features are used as criteria for script classification and identification. The proposed method is tested on the samples of old Slavic printed documents written in Glagolitic, Cyrillic and Latin script.
Similar content being viewed by others
References
Bharati MH, Liu JJ, MacGregor JF (2004) Image texture analysis: methods and comparisons. Chemom Intell Lab Systems 72(1):57–71
Brodić D, Milivojević ZN, Maluckov Č (2013) Recognition of the script in Serbian documents using frequency occurrence and co-occurrence analysis. Sci World J 2013(896328):1–14
Brodić D, Milivojević Z, Maluckov Č A (2014) Script characterization in the old Slavic documents. In: Elmoataz A, Lezoray O, Nouboud F, Mammass D (eds) Image and Signal Processing, LNCS 8509, pp 230–238. Springer, Berlin
Busch A, Boles WW, Sridharan S (2006) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732
Chaudhuri BB, Pal U, Mitra M (2002) Automatic recognition of printed Oriya script. Sadhana 27(1):23–34
Clausi DA (2002) An analysis of co-occurrence texture statistics as a function of grey level quantization. Can J Remote Sens 28(1):45–62
Del Bimbo A (2001) Visual information retrieval. Morgan Kaufmann Publishers Inc, San Francisco
Eleyan A, Demirel H (2011) Co-occurrence matrix and its statistical features as a new approach for face recognition. Turkish J Electrical Eng Comput Sci 19(1):98–107
Ghosh D, Dube T, Shivaprasad AP (2010) Script recognition—a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161
Haralick R, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Systems Man Cybern 3(6):610–621
Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804
Joshi GD, Garg S, Sivaswamy J (2007) A generalised framework for script identification. Int J Document Anal Recogn ( IJDAR) 10(2):55–68
Pal U, Chaudhury BB (2002) Identification of different script lines from multi-script documents. Image Vis Comput 20(13–14):945–954
Silva C, Ribeiro B (2007) On text-based mining with active learning and background knowledge using SVM. Soft Comput 11(6):519–530
Tolambiya A, Venkatraman S, Kalra PK (2010) Content-based image classification with wavelet relevance vector machines. Soft Comput 14(2):129–136
Valkealahti K, Oja E (1998) Reduced multidimensional co-occurrence histograms in texture classification. IEEE Trans Pattern Anal Mach Intell 20(1):90–94
Yang Z, Purves D (2004) The statistical structure of natural light patterns determines perceived light intensity. In: Proceedings of the National Academy of sciences of the United States of America 101(23):8745–8750
Zhang J, Tan T (2002) Brief review of invariant texture analysis methods. Pattern Recogn 35(3):735–747
Zramdini AW, Ingold R (1998) Optical font recognition using typographical features. IEEE Trans Pattern Anal Mach Intell 20(8):877–882
Acknowledgments
This work was partially supported by the Grant of the Ministry of Education, Science and Technological Development of the Republic Serbia, as a part of the project TR33037 and III43011.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Brodić, D., Milivojević, Z.N. & Maluckov, Č.A. An approach to the script discrimination in the Slavic documents. Soft Comput 19, 2655–2665 (2015). https://doi.org/10.1007/s00500-014-1435-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1435-1