Document image characterization using a multiresolution analysis of the texture: application to old documents

  • Nicholas Journet
  • Jean-Yves Ramel
  • Rémy Mullot
  • Véronique Eglin
Original Paper

Abstract

In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analyzed documents. Experimentation based on segmentation, data analysis and document image retrieval tools demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus.

Keywords

Document image analysis Texture features Multiresolution Digital libraries Indexing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allier, B., Emptoz, H.: Font type extraction and character prototyping using gabor filters. ICDAR 02, 799–804 (2003). http://doi.ieeecomputersociety.org/
  2. 2.
    Antonacopoulos A.: Page segmentation using the description of the background. Comput. Vis. Image Underst. 70(3), 350–369 (1998). doi:10.1006/cviu.1998.0691 CrossRefGoogle Scholar
  3. 3.
    Basa P., Sabari P.S., Nishikanta R.: Gabor filters for document analysis in Indian bilingual documents. Proc. Int. Conf. Intell. Sens. Inf. Process. 1, 123–126 (2004)Google Scholar
  4. 4.
    Bres, S.: Contributions a la quantification des critFres de transparence et d’anisotropie par une approche globale. Ph.D. thesis, LIRIS, Université de Lyon (1994)Google Scholar
  5. 5.
    Caron Y., Charpentier H., Makris P., Vincent N.: Power law dependencies to detect regions of interest. Lect. Notes Comput. Sci. 2886, 495–503 (2003)Google Scholar
  6. 6.
    Chan W., Coghill G.: Text analysis using local energy. Pattern Recognit. 34(12), 2523–2532 (2001)MATHCrossRefGoogle Scholar
  7. 7.
    Chetverikov, D., Liang, J., Komuves, J., Haralick, R.M.: Zone classification using texture features. In: ICPR ’96, vol. III–7276, p. 676. IEEE Computer Society, Washington, DC (1996)Google Scholar
  8. 8.
    Cinque L., Lombardi L., Manzini G.: A multiresolution approach for page segmentation. Pattern Recogn. Lett. 19(2), 217–225 (1998). doi:10.1016/S0167-8655(97)00169-4 CrossRefGoogle Scholar
  9. 9.
    Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. CVIU 70(3), 287–298 (1998). http://citeseer.ist.psu.edu/doermann98indexing.html
  10. 10.
    Eglin, V.: Contribution a la structuration fonctionnelle des documents imprims. Ph.D. thesis, LIRIS (1998)Google Scholar
  11. 11.
    Eglin V., Bres S.: Analysis and interpretation of visual saliency for document functional labeling. Int. J. Doc. Anal. Recognit. 7(1), 28–43 (2004). doi:10.1007/s10032-004-0127-2 Google Scholar
  12. 12.
    Etemad K., Doermann D., Chellappa R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997). doi:10.1109/34.566817 CrossRefGoogle Scholar
  13. 13.
    Hall-Beyer, M.: Glcm texture: a tutorial. Technical report (2000). http://www.cas.sc.edu/geog/rslab/Rscc/mod6/6-5/texture/tutorial.html, GLCM
  14. 14.
    Haralick R., Shanmugam K., Dinstein I.: Textural features for image classification. SMC 3(6), 610–621 (1973)Google Scholar
  15. 15.
    Journet, N., Mullot, R., Ramel, J.Y., Eglin, V.: Ancient printed documents indexation: a new approach. In: ICAPR (1), pp. 580–589 (2005)Google Scholar
  16. 16.
    Kaufman L., Rousseeuw P.J.: Finding Groups in Data. Wiley, New York (1990)Google Scholar
  17. 17.
    Khedekar, S., Ramanaprasad, V., Setlur, S., Govindaraju, V.: Text–image separation in devanagari documents. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, p. 1265. IEEE Computer Society, Washington, DC (2003)Google Scholar
  18. 18.
    Laws, K.I.: Rapid texture identification. In: Image processing for missile guidance; Proceedings of the Seminar, San Diego, CA, July 29–August 1, 1980 (A81-39326 18-04) Bellingham, WA, Society of Photo-Optical Instrumentation Engineers, pp. 376–380 (1980)Google Scholar
  19. 19.
    Ma, H., Doermann, D.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, p. 968. IEEE Computer Society, Washington, DC (2003)Google Scholar
  20. 20.
    Maderlechner G., Suda P., Breckner T.: Classification of documents by form and content. Pattern Recogn. Lett. 18(11–13), 1225–1231 (1997). doi:10.1016/S0167-8655(97)00098-6 CrossRefGoogle Scholar
  21. 21.
    Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE 5010, 197–207 (2003)CrossRefGoogle Scholar
  22. 22.
    Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of DIAL ’06, pp. 243–253. IEEE Computer Society, Washington, DC (2006). doi:10.1109/DIAL.2006.44
  23. 23.
    Nagy, G., Kanai, J., Krishnamoorthy, M., Thomas, M., Viswanathan, M.: Two complementary techniques for digitized document analysis. In: DOCPROCS ’88: Proceedings of the ACM Conference on Document Processing Systems, pp. 169–176. ACM Press, New York (1988). doi:10.1145/62506.62539
  24. 24.
    Nicolas S., Kessentini Y., Paquet T., Heutte L.: Handwritten document segmentation using hidden Markov random fields. ICDAR 1, 212–216 (2006)Google Scholar
  25. 25.
    Pavlidis T., Zhou J.: Page segmentation by white streams. ICDAR 2, 945–953 (1991)Google Scholar
  26. 26.
    Ramel J., Busson S., Demonet M.: Agora: the interactive document image analysis tool of the bvh project. DIAL 0, 145–155 (2006). doi:10.1109/DIAL.2006.2 Google Scholar
  27. 27.
    Shafait F., Keysers D., Breuel T.M.: Performance comparison of six algorithms for page segmentation. In: Procedings of the Seventh IAPR Workshop on Document Analysis Systems (DAS) 3872, 368–379 (2006)Google Scholar
  28. 28.
    Shi Z., Govindaraju V.: Multi-scale techniques for document page segmentation. ICDAR 0, 1020–1024 (2005). doi:10.1109/ICDAR.2005.165 Google Scholar
  29. 29.
    Tuceryan, M.: Moment-based texture segmentation. PRL 15(7), 659–668 (1994). http://citeseer.ist.psu.edu/tuceryan94moment.html Google Scholar
  30. 30.
    Uttama, S., Ogier, J., Loonis, P.: Top-down segmentation of ancient graphical drop caps. GREC, pp. 87–95 (2005)Google Scholar
  31. 31.
    Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Dev. 26(6), 647–656 (1982)CrossRefGoogle Scholar
  32. 32.
    Youness G., Saporta G.: Une méthodologie pour la comparaison de partitions. Revue de Statistique Appliquée 52, 97–120 (2004)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Nicholas Journet
    • 1
  • Jean-Yves Ramel
    • 1
  • Rémy Mullot
    • 2
  • Véronique Eglin
    • 3
  1. 1.LIToursFrance
  2. 2.L3ILa Rochelle Cedex 1France
  3. 3.LIRIS INSA de LyonVilleurbanne CedexFrance

Personalised recommendations