An experimental comparison of clustering methods for content-based indexing of large image databases

Lai, Hien Phuong; Visani, Muriel; Boucher, Alain; Ogier, Jean-Marc

doi:10.1007/s10044-011-0261-7

An experimental comparison of clustering methods for content-based indexing of large image databases

Survey
Published: 13 January 2012

Volume 15, pages 345–366, (2012)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Hien Phuong Lai^1,2,
Muriel Visani¹,
Alain Boucher² &
…
Jean-Marc Ogier¹

528 Accesses
12 Citations
Explore all metrics

Abstract

In recent years, the expansion of acquisition devices such as digital cameras, the development of storage and transmission techniques of multimedia documents and the development of tablet computers facilitate the development of many large image databases as well as the interactions with the users. This increases the need for efficient and robust methods for finding information in these huge masses of data, including feature extraction methods and feature space structuring methods. The feature extraction methods aim to extract, for each image, one or more visual signatures representing the content of this image. The feature space structuring methods organize indexed images in order to facilitate, accelerate and improve the results of further retrieval. Clustering is one kind of feature space structuring methods. There are different types of clustering such as hierarchical clustering, density-based clustering, grid-based clustering, etc. In an interactive context where the user may modify the automatic clustering results, incrementality and hierarchical structuring are properties growing in interest for the clustering algorithms. In this article, we propose an experimental comparison of different clustering methods for structuring large image databases, using a rigorous experimental protocol. We use different image databases of increasing sizes (Wang, PascalVoc2006, Caltech101, Corel30k) to study the scalability of the different approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content-Based Image Indexing by Data Clustering and Inverse Document Frequency

Image Indexing and Retrieval Using GSOM Algorithm

Automatic Feature Detection and Clustering Using Random Indexing

Notes

The medoid is defined as the cluster object which has the minimal average distance between it and the other objects in the cluster.
http://wang.ist.psu.edu/docs/related/.
http://pascallin.ecs.soton.ac.uk/challenges/VOC/.
http://www.vision.caltech.edu/Image_Datasets/Caltech101/.
http://www.cs.ubc.ca/~lowe/keypoints/.
http://staff.science.uva.nl/~ksande/research/colordescriptors/.

References

Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, Redwood City
MATH Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet MATH Google Scholar
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of the workshop on text mining, 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2000)
Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22:2405–2412
Article Google Scholar
Marinai S, Marino E, Soda G (2008) A comparison of clustering methods for word image indexing. In: The 8th IAPR international workshop on document analysis system, pp 671–676
Serban G, Moldovan GS (2006) A comparison of clustering techniques in aspect mining. Studia Universitatis Babes Bolyai Informatica LI(1):69–78
MathSciNet Google Scholar
Wang XY, Garibaldi JM (2005) A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis. In: Proceedings of the 2nd international conference on computational intelligence in medicine and healthcare (CIMED), pp 250–256
Hirano S, Tsumoto S (2005) Empirical comparison of clustering methods for long time-series databases. Lecture notes in artificial intelligence (LNAI) (Subseries of Lecture notes in computer science), vol 3430, pp 268–286
Meila M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29
Article MATH Google Scholar
Hirano S, Sun X, Tsumoto S (2004) Comparison of clustering methods for clinical databases. Inf Sci 159(3-4):155–165
Article MathSciNet Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Article Google Scholar
Xu R, Wunsch DII (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Plataniotis KN, Venetsanopoulos AN (2000) Color image processing and applications. Springer, Berlin, pp 25–32, 260–275
van de Sande KEA, Gevers T, Snoek CGM (2008) Evaluation of color descriptors for object and scene recognition. In: IEEE proceedings of the computer society conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska
Mindru F, Tuytelaars T, Van Gool L, Moons T (2004) Moment invariants for recognition under changing viewpoint and illumination. Comput Vis Image Underst 94(1–3):3–27
Article Google Scholar
Haralick RM (1979) Statistical and structural approaches to texture. IEEE Proc 67(5):786–804
Article Google Scholar
Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971
Article Google Scholar
Kuizinga P, Petkov N, Grigorescu S (1999) Comparison of texture features based on gabor filters. In: Proceedings of the 10th international conference on image analysis and processing (ICIAP), pp 142–147
Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187
MATH Google Scholar
Teague MR (1979) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930
Article MathSciNet Google Scholar
Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans PAMI 12:489–498
Article Google Scholar
Fonga H (1996) Pattern recognition in gray-level images by Fourier analysis. Pattern Recogn Lett 17(14):1477–1489
Article Google Scholar
Harris C, Stephens MJ (1998) A combined corner and edge detector. In: Proceedings of the 4th Alvey vision conference, Manchester, UK, pp 147–151
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2(60):91–110
Article Google Scholar
Lindeberg T (1994) Scale-space theory: a basic tool for analysing structures at different scales. J Appl Stat 21(2):224–270
Google Scholar
Zhang J, Marszaek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2):213–238
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. CVIU 110(3):346–359
Google Scholar
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans PAMI 30(4):712–727
Article Google Scholar
van de Weijer J, Gevers T, Bagdanov A (2006) Boosting color saliency in image feature detection. IEEE Trans PAMI 28(1):150–156
Article Google Scholar
Abdel-Hakim AE, Farag AA (2006) CSIFT: a SIFT descriptor with color invariant characteristics. In: IEEE conference on CVPR, New York, pp 1978–1983
Antonopoulos P, Nikolaidis N, Pitas I (2007) Hierarchical face clustering using SIFT image features. In: Proceedings of IEEE symposium on computational intelligence in image and signal processing (CIISP), pp 325–329
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision (ICCV), Nice, France, pp 1470–1477
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Zhang B, Hsu M, Dayal U (1999) K-harmonic means—a data clustering algorithm. Technical report HPL-1999-124, Hewlett-Packard Labs
Likas A, Vlassis N, Verbeek J (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Article Google Scholar
Berrani SA (2004) Recherche approximative de plus proches voisins avec contrôle probabiliste de la précision; application à la recherche dimages par le contenu, PhD thesis
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016
Article Google Scholar
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. The Morgan Kaufmann, San Francisco
Google Scholar
Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155
Article Google Scholar
Gallager RG, Humblet PA, Spira PM (1983) A distributed algorithm for minimum-weight spanning trees. ACM Trans Program Lang Syst 5:66–77
Article MATH Google Scholar
Lance GN, Williams WT (1967) A general theory of classification sorting strategies. II. Clustering systems. Comput J 10:271–277
Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J ACM 58(301):236–244
Google Scholar
Ribert A, Ennaji A, Lecourtier Y (1999) An incremental hierarchical clustering. In: Proceedings of the 1999 vision interface (VI) conference, pp 586–591
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114
Article Google Scholar
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th IEEE international conference on data engineering (ICDE), pp 512–521
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithms for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Seattle, WA, pp 73–84
Guttman A (1984) R-tree: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, MA, pp 47–57
Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the 16th international conference on very large databases (VLDB), pp 507–518
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331
White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the 12th IEEE ICDE, pp 516–523
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, Arizon USA, pp 369–380
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th VLDB, Athens, Greece. Morgan Kaufmann, San Francisco, pp 186–195
Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th VLDB, New York, NY, pp 428–439
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data, New York, NY, USA, pp 94–105
Mclachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Hinneburg A, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4):387–415
Article Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sande J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, pp 49–60
Koskela M (2003) Interactive image retrieval using self-organizing maps. PhD thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D1, Espoo, Finland
Carpenter G, Grossberg S (1990) ART3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3:129–152
Google Scholar
Shamir R, Sharan R (2002) Algorithmic approaches to clustering gene expression data. Current topics in computational biology. MIT Press, Boston, pp 269–300
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I and II. SIGMOD Record 31(2):40–45
Google Scholar
Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Technical report TR 0140, Department of Computer Science, University of Minnesota
Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining, pp 59–70
Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint conference on empirical methods in natural language processing and computational language learning, Prague, pp 410–420
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Milligan GW, Soon SC, Sokol LM (1983) The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure. IEEE Trans PAMI 5:40–47
Article Google Scholar
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78:553–569
Article MATH Google Scholar
Mirkin BG (1996) Mathematical classification and clustering. Kluwer, Dordrecht, pp 105–108
Book MATH Google Scholar

Download references

Acknowledgment

Grateful acknowledgment is made for financial support by the Poitou-Charentes Region (France).

Author information

Authors and Affiliations

L3I, Université de La Rochelle, 17042, La Rochelle cedex 1, France
Hien Phuong Lai, Muriel Visani & Jean-Marc Ogier
IFI, MSI team, IRD, UMI 209 UMMISCO, Vietnam National University, 42 Ta Quang Buu, Hanoi, Vietnam
Hien Phuong Lai & Alain Boucher

Authors

Hien Phuong Lai
View author publications
You can also search for this author in PubMed Google Scholar
Muriel Visani
View author publications
You can also search for this author in PubMed Google Scholar
Alain Boucher
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Ogier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hien Phuong Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, H.P., Visani, M., Boucher, A. et al. An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Applic 15, 345–366 (2012). https://doi.org/10.1007/s10044-011-0261-7

Download citation

Received: 04 January 2011
Accepted: 27 December 2011
Published: 13 January 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s10044-011-0261-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An experimental comparison of clustering methods for content-based indexing of large image databases

Abstract

Access this article

Similar content being viewed by others

Content-Based Image Indexing by Data Clustering and Inverse Document Frequency

Image Indexing and Retrieval Using GSOM Algorithm

Automatic Feature Detection and Clustering Using Random Indexing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An experimental comparison of clustering methods for content-based indexing of large image databases

Abstract

Access this article

Similar content being viewed by others

Content-Based Image Indexing by Data Clustering and Inverse Document Frequency

Image Indexing and Retrieval Using GSOM Algorithm

Automatic Feature Detection and Clustering Using Random Indexing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation