Pattern Analysis and Applications

, Volume 20, Issue 2, pp 325–364 | Cite as

A texture-based pixel labeling approach for historical books

  • Maroua Mehri
  • Petra Gomez-Krämer
  • Pierre Héroux
  • Alain Boucher
  • Rémy Mullot
Theoretical Advances


Over the last few years, there has been tremendous growth in the automatic processing of digitized historical documents. In fact, finding reliable systems for the interpretation of ancient documents has been a topic of major interest for many libraries and the prime issue of research in the document analysis community. One important challenge is to refine well-known approaches based on strong a priori knowledge (e.g., the document image content, layout, typography, font size and type, scanning resolution, image size, etc.). Nevertheless, a texture analysis approach has consistently been chosen to segment a page layout when information is lacking on document structure and content. Thus, in this article, a framework is proposed to investigate the use of texture as a tool for automatically determining homogeneous regions in a digitized historical book and segmenting its contents by extracting and analyzing texture features independently of the layout of the pages. The proposed framework is parameter free and applicable to a large variety of ancient of books. It does not assume a priori information regarding document image content and structure. It consists of two phases: a texture-based feature extraction step and unsupervised clustering and labeling task based on the consensus clustering, hierarchical ascendant classification, and nearest neighbor search algorithms. The novelty of this work lies in the clustering of extracted texture descriptors to find automatically homogeneous regions, i.e., graphic and textual regions, using the clustering approach on an entire book instead of processing each page individually. Our framework has been evaluated on a large variety of historical books and achieved promising results.


Digitized historical books Pixel labeling Texture Autocorrelation Multiresolution Purity per block 



The support of this research by the ANR (French National Research Agency) under contract ANR-10-CORD-0020 is gratefully acknowledged. The authors would like also to thank Geneviève CRON of the BnF for providing access to the Gallica digital library.


  1. 1.
    André J, Chabin MA (1999) Les documents anciens, Document NumériqueGoogle Scholar
  2. 2.
    LeBourgeois F, Trinh E, Allier B, Eglin V, Emptoz H (2004) Document images analysis solutions for digital libraries. In: International workshop on document image analysis for libraries. IEEE, New York, pp 2–24Google Scholar
  3. 3.
    LeBourgeois F, Emptoz H (2007) DEBORA: Digital AccEss to BOoks of the RenAissance. Int J Doc Anal Recognit 193–221Google Scholar
  4. 4.
    Baechler M, Fischer A, Naji N, Ingold R, Bunke H, Savoy J (2012) HisDoc: historical document analysis, recognition, and retrieval. In: Digital humanities—international conference of the alliance of digital humanities organizations (ADHO)Google Scholar
  5. 5.
    Ogier JM, Tombre K (2006) Madonne: document image analysis techniques for cultural heritage documents. In: International conference on digital cultural heritageGoogle Scholar
  6. 6.
    Rath TM, Manmatha R (2007) Word spotting for historical documents. Int J Doc Anal Recognit 139–152Google Scholar
  7. 7.
    Baird HS (2003) Digital libraries and document image analysis. In: International conference on document analysis and recognition. IEEE, New York, pp 2–14Google Scholar
  8. 8.
    Ogier JM (2005) Ancient document analysis: a set of new research problems. In: Colloque international francophone sur l’Ecrit et le documentGoogle Scholar
  9. 9.
    Coustaty M, Raveaux R, Ogier JM (2011) Historical document analysis: a review of French projects and open issues. In: European signal processing conference, EURASIP, pp 1445–1449Google Scholar
  10. 10.
    Okun O, Pietikäinen M (1999) A survey of texture-based methods for document layout analysis. In: Workshop on texture analysis in machine vision. Springer, Berlin, pp 137–148Google Scholar
  11. 11.
    Piper A (2013) Reading’s refrain: from bibliography to topology. In: Readings: selected essays from the English Institute, pp 373–399Google Scholar
  12. 12.
    Nalisnick ET, Baird HS (2013) Extracting sentiment networks from Shakespeare’s plays. In: International conference on document analysis and recognition. IEEE, New York, pp 758–762Google Scholar
  13. 13.
    Agam G, Bal G, Frieder G, Frieder O (2007) Degraded document image enhancement. In: Document recognition and retrieval. SPIE, UKGoogle Scholar
  14. 14.
    Likforman-Sulem L (2003) Apport du traitement des images à la numérisation des documents anciens. Document Numérique, pp 13–26Google Scholar
  15. 15.
    André J, Richy H, Likforman-Sulem L, Ventabert G (1999) Electronic representation and use of old documents (texts and images): about philectre project experiments. Document Numérique, pp 57–73Google Scholar
  16. 16.
    Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Doc Anal Recognit 9:123–138CrossRefGoogle Scholar
  17. 17.
    Nagy G, Seth S (1984) Hierarchical representation of optically scanned documents. In: International conference on pattern recognition. IEEE, New York, pp 347–349Google Scholar
  18. 18.
    Wahl FM, Wong KY, Casey RG (1982) Block segmentation and text extraction in mixed text/image documents. Comput Graph Image Process 20:375–390CrossRefGoogle Scholar
  19. 19.
    Zhou YP, Tan CL (2000) Hough technique for bar charts detection and recognition in document images. In: International conference on image processing. IEEE, New York, pp 605–608Google Scholar
  20. 20.
    Belaïd A, Ouwayed N (2011) Guide to OCR for Arabic scripts: segmentation of ancient Arabic documents. Springer, BerlinGoogle Scholar
  21. 21.
    Nikolaou N, Makridis M, Gatos B, Stamatopoulos N, Papamarkos N (2010) Segmentation of historical machine-printed documents using adaptive run-length smoothing and skeleton segmentation paths. Imag Vis Comput 28:590–604CrossRefGoogle Scholar
  22. 22.
    Serra J (1982) Image analysis and mathematical morphology. Academic Press, LondonMATHGoogle Scholar
  23. 23.
    Granado I, Mengucci M, Muge F (2000) Extraction de textes et de figures dans les livres anciens à l’aide de la morphologie mathématique. In: Colloque International Francophone sur l’Ecrit et le DocumentGoogle Scholar
  24. 24.
    Muge F, Granado I, Mengucci M, Pina P, Ramos V, Sirakov N, Pinto JRC, Marcolino A, Ramalho M, Vieira P, Amaral AMD (2000) Automatic feature extraction and recognition for digital access of books of the Renaissance. In: Research and advanced technology for digital libraries. Lecture notes in computer science. Springer, Berlin, pp 1–13Google Scholar
  25. 25.
    Mengucci M, Granado I (2002) Morphological segmentation of text and figures in renaissance books (XVI century). In: Mathematical morphology and its applications to image and signal processing computational imaging and vision, pp 397–404Google Scholar
  26. 26.
    Ramel JY, Leriche S, Demonet ML, Busson S (2007) User-driven page layout analysis of historical printed books. Int J Doc Anal Recognit 9:243–261CrossRefGoogle Scholar
  27. 27.
    Crasson A, Fekete JD (2004) Structuration des manuscrits: du corpus à la région. In: Colloque International Francophone sur l’Ecrit et le DocumentGoogle Scholar
  28. 28.
    Kise K (2014) Page segmentation techniques in document analysis. In: Handbook of document image processing and recognition. Springer, BerlinGoogle Scholar
  29. 29.
    Julesz B (1962) Visual pattern discrimination. Inf Theory 8:84–92CrossRefGoogle Scholar
  30. 30.
    Chen N, Blostein D (2007) A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int J Doc Anal Recognit 10:1–16CrossRefGoogle Scholar
  31. 31.
    Journet N, Ramel J, Mullot R, Eglin V (2008) Document image characterization using a multiresolution analysis of the texture: application to old documents. Int J Doc Anal Recognit 11:9–18CrossRefGoogle Scholar
  32. 32.
    Mehri M, Héroux P, Gomez-Krämer P, Mullot R (2013) A pixel labeling approach for historical digitized books. In: International conference on document analysis and recognition. IEEE, New York, pp 817–821Google Scholar
  33. 33.
    Cohen R, Asi A, Kedem K, El-Sana J, Dinstein I (2013) Robust text and drawing segmentation algorithm for historical documents. In: International workshop on historical document imaging and processing. ACM, New York, pp 110–117Google Scholar
  34. 34.
    Lai HP, Visani M, Boucher A, Ogier JM (2012) An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Appl 15:345–366MathSciNetCrossRefGoogle Scholar
  35. 35.
    Allier B, Duong J, Gagneux A, Mallet P, Emptoz H (2003) Texture feature characterization for logical pre-labeling. In: International conference on document analysis and recognition. IEEE, New York, pp 567–571Google Scholar
  36. 36.
    Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. Pattern Anal Mach Intell 22:4–37CrossRefGoogle Scholar
  37. 37.
    Liua Y, Wub S, Zhoua X (2003) Texture segmentation based on features in wavelet domain for image retrieval, pp 2026–2034Google Scholar
  38. 38.
    Jain AK, Bkattacharjee SK, Chen Y (1992) On texture in document images. In: Computer vision and pattern recognition. IEEE, New York, pp 677–680Google Scholar
  39. 39.
    Chen CH, Pau LF, Wang P (1998) Texture analysis in the handbook of pattern recognition and computer vision, 2nd edn. World Scientific, SingaporeGoogle Scholar
  40. 40.
    Tuceryan M, Jain AK (1998) Texture analysis. In: Chen CH, Pau LF, Wang PSP (eds) The handbook of pattern recognition and computer vision, 2nd edn. World Scientific Publishing Co, SingaporeGoogle Scholar
  41. 41.
    Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. Syst Man Cybern 3:610–621CrossRefGoogle Scholar
  42. 42.
    Tuceryan M, Jain AK (1990) Texture segmentation using Voronoi polygons. Pattern Anal Mach Intell 12:211–216CrossRefGoogle Scholar
  43. 43.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, pp 282–289Google Scholar
  44. 44.
    Nicolas S, Kessentini Y, Paquet T, Heutte L (2005) Handwritten document segmentation using hidden Markov random fields. In: International conference on document analysis and recognition. IEEE, New York, pp 212–216Google Scholar
  45. 45.
    Chellappa R, Chatterjee S (1984) Classification of textures using Markov random field models. In: International conference on acoustics, speech, and signal processing. IEEE, New York, pp 694–697Google Scholar
  46. 46.
    Ferrell R, Gleason S, Tobin K (2003) Application of fractal encoding techniques for image segmentation. In: International conference on quality control by artificial vision. SPIE, Bellingham, pp 69–77Google Scholar
  47. 47.
    Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell 24:971–987CrossRefMATHGoogle Scholar
  48. 48.
    Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. Mach Vis Appl 5:169–184CrossRefGoogle Scholar
  49. 49.
    Sabharwal C, Subramanya S (2001) Indexing image databases using wavelet and discrete Fourier transform. In: Symposium on applied computing. ACM, New York, pp 434–439Google Scholar
  50. 50.
    Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. Pattern Anal Mach Intell 11:674–693CrossRefMATHGoogle Scholar
  51. 51.
    Tuceryan M (1994) Moment based texture segmentation. Pattern Recognit Lett 15:659–668CrossRefGoogle Scholar
  52. 52.
    Uttama S, Loonis P, Delalandre M, Ogier JM (2006) Segmentation and retrieval of ancient graphic documents. In: International workshop on graphics recognition on graphics recognition (GREC): ten years review and future perspectives. Springer, Berlin, pp 88–98Google Scholar
  53. 53.
    Mehri M, Gomez-Krämer P, Héroux P, Mullot R (2013) Old document image segmentation using the autocorrelation function and multiresolution analysis. In: Document recognition and retrieval. SPIE, BellinghamGoogle Scholar
  54. 54.
    Haralick RM (1979) Statistical and structural approaches to texture. In: Proceedings of the IEEE, pp 786–804Google Scholar
  55. 55.
    Petrou M, Sevilla PG (2006) Image processing: dealing with texture. Wiley, New YorkCrossRefGoogle Scholar
  56. 56.
    Eglin V, Bres S, Rivero C (2007) Hermite and Gabor transforms for noise reduction and handwriting classification in ancient manuscripts. Int J Doc Anal Recognit 9:101–122CrossRefGoogle Scholar
  57. 57.
    Garz A, Sablatnig R (2010) Multi-scale texture-based text recognition in ancient manuscripts. In: International conference on virtual systems and multimedia. IEEE, New York, pp 336–339Google Scholar
  58. 58.
    Grana C, Borghesani D, Cucchiara R (2011) Automatic segmentation of digitalized historical manuscripts. Multimed Tools Appl 55:483–506CrossRefGoogle Scholar
  59. 59.
    Ouji A, Leydier Y, LeBourgeois F (2011) Chromatic/achromatic separation in noisy document images. In: International conference on document analysis and recognition. IEEE, New York, pp 167–171Google Scholar
  60. 60.
    Bres S (1994) Contributions à la quantification des critères de transparence et d’anisotropie par une approche globale : Application au contrôle de qualité de matériaux composites. Ph.D. dissertation, Institut National des Sciences Appliquées de Lyon, Lyon, FranceGoogle Scholar
  61. 61.
    Mehri M, Gomez-Krämer P, Héroux P, Boucher A, Mullot R (2013) Texture feature evaluation for segmentation of historical document images. In: International workshop on historical document imaging and processing. ACM, New York, pp 102–109Google Scholar
  62. 62.
    Mehri M, Gomez-Krämer P, Héroux P, Boucher A, Mullot R (2014) A pixel labeling framework for comparing texture features: application to digitized ancient books. In: International conference on pattern recognition applications and methods. SciTePress, Canada, pp 553–560Google Scholar
  63. 63.
    Peake G, Tan T (1997) Script and language identification from document images. In: Document image analysis. IEEE, New York, pp 10–17Google Scholar
  64. 64.
    Busch A, Boles WW, Sridharan S (2005) Texture for script identification. Pattern Anal Mach Intell 27:1720–1732CrossRefGoogle Scholar
  65. 65.
    Zhu Y, Tan T, Wang Y (2001) Font recognition based on global texture analysis. Pattern Anal Mach Intell 23:1192–1200CrossRefGoogle Scholar
  66. 66.
    Ma H, Doermann D (2003) Gabor filter based multi-class classifier for scanned document images. In: International conference on document analysis and recognition. IEEE, New York, pp 968–972Google Scholar
  67. 67.
    Jain AK, Zhong Y (1996) Page segmentation using texture analysis. Pattern Recognit 29:743–770CrossRefGoogle Scholar
  68. 68.
    Randen T, Husøy JH (1994) Segmentation of text/image documents using texture approachesGoogle Scholar
  69. 69.
    Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy C-means clustering algorithm. In: Computers and geosciences. Pergamon Press, UK, pp 191–203Google Scholar
  70. 70.
    Kovács F, Legány C, Babos A (2006) Cluster validity measurement techniques. In: International conference on artificial intelligence, knowledge engineering and data bases. World Scientific and Engineering Academy and Society, Greece, pp 388–393Google Scholar
  71. 71.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley symposium on mathematical statistics and probability. University of California Press, California, pp 281–297Google Scholar
  72. 72.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefMATHGoogle Scholar
  73. 73.
    Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies 1. Hierarchical systems. Comput J 9:373–380CrossRefGoogle Scholar
  74. 74.
    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining. AAAI Press, Palo Alto, pp 226–231Google Scholar
  75. 75.
    Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: International conference on management of data. ACM Press, New York, pp 49–60Google Scholar
  76. 76.
    McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New YorkMATHGoogle Scholar
  77. 77.
    Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: International conference on very large data. Morgan Kaufmann, Burlington, pp 186–195Google Scholar
  78. 78.
    Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-eesolution clustering approach for very large spatial databases. In: International conference on very large data. Morgan Kaufmann, Burlington, pp 428–439Google Scholar
  79. 79.
    Smigiel E, Belaïd A, Hamza H (2004) Self-organizing maps and ancient documents. In: International workshop on document analysis systems. Springer, Berlin, pp 125–134Google Scholar
  80. 80.
    Rosenblatt JF (1962) Principles of neurodynamics. Spartan Books, ItalyGoogle Scholar
  81. 81.
    Xu R (2005) Survey of clustering algorithms. Neural Netw 16:645–678CrossRefGoogle Scholar
  82. 82.
    Cocquerez J, Philipp S (1995) Analyse d’images: filtrage et segmentation, MassonGoogle Scholar
  83. 83.
    Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley-Interscience, New YorkMATHGoogle Scholar
  84. 84.
    Cord M, Cunningham P (2008) Machine learning techniques for multimedia case studies on organization and retrieval, series: cognitive technologies. Springer, BerlinCrossRefGoogle Scholar
  85. 85.
    Cornuéjols A, Miclet L (2010) Apprentissage artificiel: concepts et algorithmes, 2nd edn. Eyrolles, ParisGoogle Scholar
  86. 86.
    Iam-on N, Garrett S (2010) LinkCluE: a Matlab package for link-based cluster ensembles. J Stat Softw 36:1–36CrossRefGoogle Scholar
  87. 87.
    Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in color image segmentation. In: International conference on advances in pattern recognition and digital techniques. Narosa Publishing House, Chennai, pp 137–143Google Scholar
  88. 88.
    Moesa HA, Akutsu DBKCT (2005) Efficient determination of cluster boundaries for analysis of gene expression profile data using hierarchical clustering and wavelet transform. Genome Inform 16:132–141Google Scholar
  89. 89.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefMATHGoogle Scholar
  90. 90.
    Lletía R, Ortiza MC, Sarabiab LA, Sánchez MS (2004) Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. In: Colloquim Chemiometricum Mediterraneum. Elsevier Science, Analytica Chimica Acta, pp 87–100Google Scholar
  91. 91.
    StatSoft (2010) Finding the right number of clusters in k-means and EM clustering: v-fold cross-validation. In: Electronic statistics textbook (Online).
  92. 92.
    Q. Zhao, M. Xu, P. Fränti (2011) Extending external validity measures for determining the number of clusters. In: International conference on intelligent systems design and applications. IEEE, New York, pp 931–936Google Scholar
  93. 93.
    Kryszczuk K, Hurley P (2010) Estimation of the number of clusters using multiple clustering validity indices. In: International conference on multiple classifier systems. Springer, Berlin, pp 114–123Google Scholar
  94. 94.
    Bolshakova N, Azuaje F (2006) Estimating the number of clusters in DNA microarray data. In: Methods of information in medicine, pp 153–157Google Scholar
  95. 95.
    Cote M, Albu AB (2014) Texture sparseness for pixel classification of business document images. Int J Doc Anal Recognit, 1–17Google Scholar
  96. 96.
    Mehri M, Kieu VC, Mhiri M, Héroux P, Gomez-Krämer P, Mahjoub MA, Mullot R (2014) Robustness assessment of texture features for the segmentation of ancient documents. In: International workshop on document analysis systems. IEEE, New York, pp 293–297Google Scholar
  97. 97.
    Otsu N (1979) A threshold selection method from gray-level histograms. Syst Man Cybern 62–66Google Scholar
  98. 98.
    Shijian L, Tan CL (2008) Script and language identification in noisy and degraded document images. Pattern Anal Mach Intell, 14–24Google Scholar
  99. 99.
    He J, Do QDM, Downton AC, Kim JH (2005) A comparison of binarization methods for historical archive documents. In: International conference on document analysis and recognition. IEEE, New York, pp 538–542Google Scholar
  100. 100.
    Lasmar AG, Kricha A, Amara NEB (2006) A segmentation text/background method for degraded ancient Arabic manuscript. In: International conference on information & communication technologies. IEEE, New York, pp 1327–1331Google Scholar
  101. 101.
    Li J, Wang JZ, Wiederhold G (2000) Classification of textured and non-textured images using region segmentation. Image Process, 754–757Google Scholar
  102. 102.
    Cinque L, Lombardi L, Manzini G (1998) A multiresolution approach for page segmentation. Pattern Recognit Lett, 217–225Google Scholar
  103. 103.
    Tan C, Ng P (1998) Text extraction using pyramid. Pattern Recognit, 63–72Google Scholar
  104. 104.
    Tan C, Zhang Z (2000) Text block segmentation using pyramid structure. In: Document recognition and retrieval. SPIE, UK, pp 297–306Google Scholar
  105. 105.
    Lemaitre A, Camillerapp J, Coüasnon B (2008) Multiresolution cooperation improves document structure recognition. Int J Doc Anal Recognit, 97–109Google Scholar
  106. 106.
    Greenspan H (1994) Multi-resolution image processing and learning for texture recognition and image enhancement. Ph.D. dissertation, California Institute of TechnologyGoogle Scholar
  107. 107.
    Contassot-Vivier S, Bosco GL, Dao NC (1996) Multiresolution approach for image processing. In: Erasmus ICP-A-2007Google Scholar
  108. 108.
    Kricha A, Amara NEB (2011) Exploring textural analysis for historical documents characterization. J comput, 24–30Google Scholar
  109. 109.
    Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manag J, 441–458Google Scholar
  110. 110.
    Simpson T, Armstrong J, Jarman A (2010) Merged consensus clustering to assess and improve class discovery with microarray data. Boston Med Center Bioinf, 1471–1482Google Scholar
  111. 111.
    Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, 91–118Google Scholar
  112. 112.
    Nguyen G, Coustaty M, Ogier JM (2010) Stroke feature extraction for lettrine indexing. In: International conference on image processing theory tools and applications. IEEE, New York, pp 355–360Google Scholar
  113. 113.
    Ward J (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc, 236–244Google Scholar
  114. 114.
    Lalys F, Haegelen C, Mehri M, Drapier S, Vérin M, Jannin P (2013) Anatomo-clinical atlases correlate clinical data and electrode contact coordinates: application to subthalamic deep brain stimulation. J Neurosci, 297–307Google Scholar
  115. 115.
    Knuth DE (1997) The art of computer programming, sorting and searching, vol 3, 2nd edn. Addison Wesley Longman Publishing Co, UKGoogle Scholar
  116. 116.
    Mahalanobis P (1936) On the generalised distance in statistics. In: Proceedings of the National Institute of Sciences of India, NISI, pp 49–55Google Scholar
  117. 117.
    Doermann D, Zotkina E, Li H (2010) GEDI—a groundtruthing environment for document images. In: International workshop on document analysis systems. ACM, New YorkGoogle Scholar
  118. 118.
    Ge F, Wang S, Liu T (2007) New benchmark for image segmentation evaluation. J Electron Imag, 1–16Google Scholar
  119. 119.
    Zhang H, Fritts J, Goldman S (2008) Image segmentation evaluation: a survey of unsupervised methods. Comput Vis Image Understanding, 260–280Google Scholar
  120. 120.
    Wontaek S, Agrawal M, Doermann D (2010) Performance evaluation tools for zone segmentation and classification (PETS). In: International conference on pattern recognition. IEEE, New York, pp 503–506Google Scholar
  121. 121.
    Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun, 27–34Google Scholar
  122. 122.
    Rendón E, Abundez I, Gutierrez C, Zagal SD, Arizmendi A, Quiroz EM, Arzate HE (2011) A comparison of internal and external cluster validation indexes. In: Applications of mathematics and computer engineering (AMERICAN-MATH/CEA. World Scientific and Engineering Academy and Society (WSEAS), pp 158–163Google Scholar
  123. 123.
    Silva A (2011) Metrics for evaluating performance in document analysis: application to tables. Int J Doc Anal Recognit, 101–109Google Scholar
  124. 124.
    Jensen JR (1986) Introductory digital image processing. Prentice-Hall, Englewood CliffsGoogle Scholar
  125. 125.
    Mather PM (1999) Computer processing of remotely-sensed images: an introduction, 2nd edn. Wiley, New YorkGoogle Scholar
  126. 126.
    Makhoul J, Kubala F, Schwartz R, Weischedel R (1999) Performance measures for information extraction. In: DARPA Broadcast News Workshop. Morgan Kaufmann Publishers Inc, Burlington, pp 249–252Google Scholar
  127. 127.
    Wei JM, Yuan XJ, Hub QH, Wang SQ (2010) A novel measure for evaluating classifiers. Exp Syst Appl, 3799–3809Google Scholar
  128. 128.
    Powers DMW (2011) Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation. J Mach Learn Technol, pp 37–63Google Scholar
  129. 129.
    Liu B (2011) Web data mining: exploring hyperlinks, contents, and usage data. Springer, BerlinGoogle Scholar
  130. 130.
    Santra AK, Christy CJ (2012) Genetic algorithm and confusion matrix for document clustering. Int J Comput Sci, 322–328Google Scholar
  131. 131.
    Saxena PC, Navaneetham K (1991) The effect of cluster size, dimensionality, and number of clusters on recovery of true cluster structure through Chernoff-type faces. J R Stat Soc Stat, 415–425Google Scholar
  132. 132.
    Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc, 553–569Google Scholar
  133. 133.
    Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Department of Computer Science, University of Minnesota, Tech. Rep. Technical report TR 0140Google Scholar
  134. 134.
    Krzanowski WJ, Lai YT (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. International Biometric Society, JSTOR, pp 23–34Google Scholar
  135. 135.
    Hartigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  136. 136.
    Calinski RB, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat, 1–27Google Scholar
  137. 137.
    Sarle WS (1983) The cubic clustering criterion. SAS Institute, Tech. Rep. SAS technical report A-108: the cubic clustering criterionGoogle Scholar
  138. 138.
    Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics, 387–397Google Scholar
  139. 139.
    Marriott FH (1971) Practical problems in a method of cluster analysis. Biometrics, 501–514Google Scholar
  140. 140.
    Milligan GW, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 159–179Google Scholar
  141. 141.
    Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc, pp 1159–1178Google Scholar
  142. 142.
    Rubin J (1967) Optimal classification into groups: an approach for solving the taxonomy problem. J Theor Biol, 103–144Google Scholar
  143. 143.
    Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull, 1072–1080Google Scholar
  144. 144.
    Davies DL, Bouldin DW (1979) A cluster separation measure. Pattern Anal Mach Intell, 224–227Google Scholar
  145. 145.
    Ratkowsky DA, Lance GN (1978) A criterion for determining the number of groups in a classification. Aust Comput J, 115–117Google Scholar
  146. 146.
    Ball GH, Hall DJ (1965) ISODATA, a novel method of data analysis and pattern classification. Menlo Park: Stanford Research Institute, Tech. Rep. AD0699616Google Scholar
  147. 147.
    Milligan GW (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 325–342Google Scholar
  148. 148.
    Frey T, Groenewoud HV (1972) A cluster analysis of the d-squared matrix of white spruce stands in saskatchewan based on the maximum-minimum principle. J Ecol, 873–886Google Scholar
  149. 149.
    McClain JO, Rao VR (1975) CLUSTISZ: a program to test for the quality of clustering of a set of objects. J Market Res, 456–460Google Scholar
  150. 150.
    Dunn J (1974) Well separated clusters and optimal fuzzy partitions. J Cybern, 95–104Google Scholar
  151. 151.
    Halkidi M, Vazirgiannis M, Batistakis I (2000) Quality scheme assessment in the clustering process. In: Principles and practice of knowledge in databases. Springer, Berlin, pp 265–276Google Scholar
  152. 152.
    Halkidi M, Batistakis I, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst, 107–145Google Scholar
  153. 153.
    Deza E, Deza MM (2013) Encyclopedia of distances. Springer, BerlinGoogle Scholar
  154. 154.
    Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc, 846–850Google Scholar
  155. 155.
    Hubert L, Arabic P (1985) Comparing partitions. J Classif, 193–218Google Scholar
  156. 156.
    Kraskov A, Stögbauer H, Andrzejak RG, Grassberger P (2003) Hierarchical clustering based on mutual information. In: Quantitative methods (q-bio.QM). CoRR q-bio.QM/0311039, 2003, pp 193–218Google Scholar
  157. 157.
    Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res, 2837–2854Google Scholar
  158. 158.
    Wei H, Chen K, Ingold R, Liwicki M (2014) Hybrid feature selection for historical document layout analysis. In: International conference on frontiers in handwriting recognition. IEEE, New York, pp 87–92Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Maroua Mehri
    • 1
  • Petra Gomez-Krämer
    • 1
  • Pierre Héroux
    • 2
  • Alain Boucher
    • 1
  • Rémy Mullot
    • 1
  1. 1.L3i, University of La RochelleLa RochelleFrance
  2. 2.LITIS EA 4108University of RouenSaint-Etienne-du-RouvrayFrance

Personalised recommendations