Advertisement

Classification of the Scripts in Medieval Documents from Balkan Region by Run-Length Texture Analysis

  • Darko BrodićEmail author
  • Alessia Amelio
  • Zoran N. Milivojević
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)

Abstract

The paper presents a script classification method of the medieval documents originated from the Balkan region. It consists in a multi-step procedure which includes the text mapping according to typographical features, creation of equivalent image patterns, run-length pattern analysis in order to establish a feature vector and state-of-the art classification method Genetic Algorithms Image Clustering for Document Analysis (GA-ICDA) which successfully disseminates the documents written in different scripts. The proposed method is evaluated on custom oriented document databases, which include the handprinted or printed documents written in old Cyrillic, angular and round Glagolitic, ancient Latin and Greek scripts. The experiment demonstrates very good results.

Keywords

Classification Historical document Optical character recognition Pattern recognition  Run-length statistics Script identification 

Notes

Acknowledgments

This work was partially supported by the Grant of the Ministry of Science of the Republic Serbia within the project TR33037.

References

  1. 1.
    Ghosh, D., Dube, T., Shivaprasad, A.: Script recognition - a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)CrossRefGoogle Scholar
  2. 2.
    Joshi, G.D., Garg, S., Sivaswamy, J.: A generalised framework for script identification. Int. J. Doc. Anal. Recogn. 10(2), 55–68 (2007)CrossRefGoogle Scholar
  3. 3.
    Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: An approach to the script discrimination in the Slavic documents. Soft Comput. 19(9), 2655–2665 (2015). doi: 10.1007/s00500-014-1435-1 CrossRefGoogle Scholar
  4. 4.
    Brodić, D., Maluckov, Č.A., Milivojević, Z.N., Draganov, I.R.: Differentiation of the script using adjacent local binary patterns. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds.) AIMSA 2014. LNCS, vol. 8722, pp. 162–169. Springer, Heidelberg (2014) Google Scholar
  5. 5.
    Zramdini, A.W., Ingold, R.: Optical font recognition using typographical features. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 877–882 (1998)CrossRefGoogle Scholar
  6. 6.
    Galloway, M.M.: Texture analysis using gray level run lengths. Comput. Graph. Image Process. 4(2), 172–179 (1975)CrossRefGoogle Scholar
  7. 7.
    Chu, A., Sehgal, C.M., Greenleaf, J.F.: Use of gray value distribution of run lengths for texture analysis. Pattern Recogn. Lett. 11(6), 415–419 (1990)CrossRefzbMATHGoogle Scholar
  8. 8.
    Dasarathy, B.R., Holder, E.B.: Image characterizations based on joint gray-level run-length distributions. Pattern Recogn. Lett. 12(8), 497–502 (1991)CrossRefGoogle Scholar
  9. 9.
    Brodić, D., Amelio, A., Milivojević, Z.N.: Characterization and distinction between closely related south Slavic languages on the example of Serbian and Croatian. In: Azzopardi, G., Petkov, N., Yamagiwa, S. (eds.) CAIP 2015. LNCS, vol. 9256, pp. 654–666. Springer, Heidelberg (2015) CrossRefGoogle Scholar
  10. 10.
    Amelio, A., Pizzuti, C.: A new evolutionary-based clustering framework for image databases. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 322–331. Springer, Heidelberg (2014) Google Scholar
  11. 11.
    Marti, R., Laguna, M., Glover, F., Campos, V.: Reducing the bandwidth of a sparse matrix with tabu search. Eur. J. Oper. Res. 135(2), 450–280 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Marinai, S., Marino, E., Soda, G.: Self-organizing maps for clustering in document image analysis, machine learning in document analysis and recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. LNCS (SCI), vol. 90, pp. 193–219. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  13. 13.
    Pu, Y., Shi, J., Guo, L.: A hierarchical method for clustering binary text image. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2012. CCIS, vol. 320, pp. 388–396. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  14. 14.
    Rigutini, L., Maggini, M.: A semi-supervised document clustering algorithm based on EM. In: Proceedings of the International Conference on 2005 IEEE/WIC/ACM on Web Intelligence, pp. 200–206 (2005)Google Scholar
  15. 15.
    Hu, X., Yoo, I.: A comprehensive comparison study of document clustering for a biomedical digital library medline. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 220–229 (2006)Google Scholar
  16. 16.
    De Vargas, R.R., Bedregal, B.R.C.: A way to obtain the quality of a partition by adjusted rand index. In: Workshop-School on Theoretical Computer Science, pp. 67–71 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Darko Brodić
    • 1
    Email author
  • Alessia Amelio
    • 2
  • Zoran N. Milivojević
    • 3
  1. 1.Technical Faculty in BorUniversity of BelgradeBorSerbia
  2. 2.Institute for High Performance Computing and NetworkingNational Research Council of Italy, CNR-ICARRendeItaly
  3. 3.College of Applied Technical SciencesNišSerbia

Personalised recommendations