Skip to main content
Log in

An experimental comparison of clustering methods for content-based indexing of large image databases

  • Survey
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In recent years, the expansion of acquisition devices such as digital cameras, the development of storage and transmission techniques of multimedia documents and the development of tablet computers facilitate the development of many large image databases as well as the interactions with the users. This increases the need for efficient and robust methods for finding information in these huge masses of data, including feature extraction methods and feature space structuring methods. The feature extraction methods aim to extract, for each image, one or more visual signatures representing the content of this image. The feature space structuring methods organize indexed images in order to facilitate, accelerate and improve the results of further retrieval. Clustering is one kind of feature space structuring methods. There are different types of clustering such as hierarchical clustering, density-based clustering, grid-based clustering, etc. In an interactive context where the user may modify the automatic clustering results, incrementality and hierarchical structuring are properties growing in interest for the clustering algorithms. In this article, we propose an experimental comparison of different clustering methods for structuring large image databases, using a rigorous experimental protocol. We use different image databases of increasing sizes (Wang, PascalVoc2006, Caltech101, Corel30k) to study the scalability of the different approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The medoid is defined as the cluster object which has the minimal average distance between it and the other objects in the cluster.

  2. http://wang.ist.psu.edu/docs/related/.

  3. http://pascallin.ecs.soton.ac.uk/challenges/VOC/.

  4. http://www.vision.caltech.edu/Image_Datasets/Caltech101/.

  5. http://www.cs.ubc.ca/~lowe/keypoints/.

  6. http://staff.science.uva.nl/~ksande/research/colordescriptors/.

References

  1. Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, Redwood City

    MATH  Google Scholar 

  2. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976

    Article  MathSciNet  MATH  Google Scholar 

  3. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of the workshop on text mining, 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2000)

  4. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22:2405–2412

    Article  Google Scholar 

  5. Marinai S, Marino E, Soda G (2008) A comparison of clustering methods for word image indexing. In: The 8th IAPR international workshop on document analysis system, pp 671–676

  6. Serban G, Moldovan GS (2006) A comparison of clustering techniques in aspect mining. Studia Universitatis Babes Bolyai Informatica LI(1):69–78

    MathSciNet  Google Scholar 

  7. Wang XY, Garibaldi JM (2005) A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis. In: Proceedings of the 2nd international conference on computational intelligence in medicine and healthcare (CIMED), pp 250–256

  8. Hirano S, Tsumoto S (2005) Empirical comparison of clustering methods for long time-series databases. Lecture notes in artificial intelligence (LNAI) (Subseries of Lecture notes in computer science), vol 3430, pp 268–286

  9. Meila M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29

    Article  MATH  Google Scholar 

  10. Hirano S, Sun X, Tsumoto S (2004) Comparison of clustering methods for clinical databases. Inf Sci 159(3-4):155–165

    Article  MathSciNet  Google Scholar 

  11. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323

    Article  Google Scholar 

  12. Xu R, Wunsch DII (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  13. Plataniotis KN, Venetsanopoulos AN (2000) Color image processing and applications. Springer, Berlin, pp 25–32, 260–275

  14. van de Sande KEA, Gevers T, Snoek CGM (2008) Evaluation of color descriptors for object and scene recognition. In: IEEE proceedings of the computer society conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska

  15. Mindru F, Tuytelaars T, Van Gool L, Moons T (2004) Moment invariants for recognition under changing viewpoint and illumination. Comput Vis Image Underst 94(1–3):3–27

    Article  Google Scholar 

  16. Haralick RM (1979) Statistical and structural approaches to texture. IEEE Proc 67(5):786–804

    Article  Google Scholar 

  17. Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971

    Article  Google Scholar 

  18. Kuizinga P, Petkov N, Grigorescu S (1999) Comparison of texture features based on gabor filters. In: Proceedings of the 10th international conference on image analysis and processing (ICIAP), pp 142–147

  19. Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187

    MATH  Google Scholar 

  20. Teague MR (1979) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930

    Article  MathSciNet  Google Scholar 

  21. Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans PAMI 12:489–498

    Article  Google Scholar 

  22. Fonga H (1996) Pattern recognition in gray-level images by Fourier analysis. Pattern Recogn Lett 17(14):1477–1489

    Article  Google Scholar 

  23. Harris C, Stephens MJ (1998) A combined corner and edge detector. In: Proceedings of the 4th Alvey vision conference, Manchester, UK, pp 147–151

  24. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2(60):91–110

    Article  Google Scholar 

  25. Lindeberg T (1994) Scale-space theory: a basic tool for analysing structures at different scales. J Appl Stat 21(2):224–270

    Google Scholar 

  26. Zhang J, Marszaek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2):213–238

    Article  Google Scholar 

  27. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. CVIU 110(3):346–359

    Google Scholar 

  28. Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans PAMI 30(4):712–727

    Article  Google Scholar 

  29. van de Weijer J, Gevers T, Bagdanov A (2006) Boosting color saliency in image feature detection. IEEE Trans PAMI 28(1):150–156

    Article  Google Scholar 

  30. Abdel-Hakim AE, Farag AA (2006) CSIFT: a SIFT descriptor with color invariant characteristics. In: IEEE conference on CVPR, New York, pp 1978–1983

  31. Antonopoulos P, Nikolaidis N, Pitas I (2007) Hierarchical face clustering using SIFT image features. In: Proceedings of IEEE symposium on computational intelligence in image and signal processing (CIISP), pp 325–329

  32. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision (ICCV), Nice, France, pp 1470–1477

  33. McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297

  34. Zhang B, Hsu M, Dayal U (1999) K-harmonic means—a data clustering algorithm. Technical report HPL-1999-124, Hewlett-Packard Labs

  35. Likas A, Vlassis N, Verbeek J (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461

    Article  Google Scholar 

  36. Berrani SA (2004) Recherche approximative de plus proches voisins avec contrôle probabiliste de la précision; application à la recherche dimages par le contenu, PhD thesis

  37. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  Google Scholar 

  38. Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016

    Article  Google Scholar 

  39. Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. The Morgan Kaufmann, San Francisco

    Google Scholar 

  40. Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155

    Article  Google Scholar 

  41. Gallager RG, Humblet PA, Spira PM (1983) A distributed algorithm for minimum-weight spanning trees. ACM Trans Program Lang Syst 5:66–77

    Article  MATH  Google Scholar 

  42. Lance GN, Williams WT (1967) A general theory of classification sorting strategies. II. Clustering systems. Comput J 10:271–277

    Google Scholar 

  43. Ward JH (1963) Hierarchical grouping to optimize an objective function. J ACM 58(301):236–244

    Google Scholar 

  44. Ribert A, Ennaji A, Lecourtier Y (1999) An incremental hierarchical clustering. In: Proceedings of the 1999 vision interface (VI) conference, pp 586–591

  45. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114

    Article  Google Scholar 

  46. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th IEEE international conference on data engineering (ICDE), pp 512–521

  47. Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithms for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Seattle, WA, pp 73–84

  48. Guttman A (1984) R-tree: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, MA, pp 47–57

  49. Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the 16th international conference on very large databases (VLDB), pp 507–518

  50. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331

  51. White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the 12th IEEE ICDE, pp 516–523

  52. Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, Arizon USA, pp 369–380

  53. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th VLDB, Athens, Greece. Morgan Kaufmann, San Francisco, pp 186–195

  54. Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th VLDB, New York, NY, pp 428–439

  55. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data, New York, NY, USA, pp 94–105

  56. Mclachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York

    MATH  Google Scholar 

  57. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231

  58. Hinneburg A, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4):387–415

    Article  Google Scholar 

  59. Ankerst M, Breunig MM, Kriegel HP, Sande J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, pp 49–60

  60. Koskela M (2003) Interactive image retrieval using self-organizing maps. PhD thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D1, Espoo, Finland

  61. Carpenter G, Grossberg S (1990) ART3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3:129–152

    Google Scholar 

  62. Shamir R, Sharan R (2002) Algorithmic approaches to clustering gene expression data. Current topics in computational biology. MIT Press, Boston, pp 269–300

  63. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  64. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I and II. SIGMOD Record 31(2):40–45

    Google Scholar 

  65. Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Technical report TR 0140, Department of Computer Science, University of Minnesota

  66. Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining, pp 59–70

  67. Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint conference on empirical methods in natural language processing and computational language learning, Prague, pp 410–420

  68. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  69. Milligan GW, Soon SC, Sokol LM (1983) The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure. IEEE Trans PAMI 5:40–47

    Article  Google Scholar 

  70. Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78:553–569

    Article  MATH  Google Scholar 

  71. Mirkin BG (1996) Mathematical classification and clustering. Kluwer, Dordrecht, pp 105–108

    Book  MATH  Google Scholar 

Download references

Acknowledgment

Grateful acknowledgment is made for financial support by the Poitou-Charentes Region (France).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien Phuong Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, H.P., Visani, M., Boucher, A. et al. An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Applic 15, 345–366 (2012). https://doi.org/10.1007/s10044-011-0261-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0261-7

Keywords

Navigation