Abstract
Over the years, many clustering algorithms have been developed, handling different issues such as cluster shape, density or noise. Most clustering algorithms require a similarity measure between patterns, either implicitly or explicitly. Although most of them use pairwise distances between patterns, e.g., the Euclidean distance, better results can be achieved using other measures. The dissimilarity increments is a new high-order dissimilarity measure, that uses the information from triplets of nearest neighbor patterns. The distribution of such measure (DID) was recently derived under the hypothesis of local Gaussian generative models, leading to new clustering algorithms. DID-based algorithm builds upon an initial data partition, different initializations producing different data partitions. To overcome this issue, we present an unifying approach based on a combination strategy of all these different initializations. Even though this allows obtaining a robust partition of the data, one must select a clustering algorithm to extract the final partition. We also present a validation criterion based on DID to select the best final partition, consisting in the estimation of graph probabilities for each cluster based on the DID.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Haas LM, Tiwary A (eds) Proceedigns ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), ACM Press, Seattle, WA, USA, pp 94–105
Aidos H, Fred A (2012) Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering. Pattern Recogn 45(9):3061–3071
Anderson TW (1962) On the distribution of the two-sample Cramér-von-Mises criterion. Ann Math Stat 33(3):1148–1159
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Ball GH, Hall DJ (1965) ISODATA, a novel method of data analysis and pattern classification. Tech. rep., Stanford Research Institute
Benavent AP, Ruiz FE, Martínez JMS (2006) EBEM: An entropy-based EM algorithm for Gaussian mixture models. In: 18th International Conference on Pattern Recognition (ICPR 2006), IEEE Computer Society, Hong Kong, vol 2, pp 451–455
Castro RM, Coates MJ, Nowak RD (2004) Likelihood based hierarchical clustering. IEEE Trans Signal Process 52(8):2308–2321
Celebi ME, Kingravi HA (2012) Deterministic initialization of the k-means algorithm using hierarchical clustering. Int J Pattern Recogn Artif Intell 26(7) DOI: 10.1142/S0218001412500188
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Dimitriadou E, Weingessel A, Hornik K (2002) A combination scheme for fuzzy clustering. In: Pal NR, Sugeno M (eds) Advances in soft computing - proceedings international conference on fuzzy systems (AFSS 2002). Lecture Notes in Computer Science, vol 2275. Springer, Calcutta, pp 332–338
Duflou H, Maenhaut W (1990) Application of principal component and cluster analysis to the study of the distribution of minor and trace elements in normal human brain. Chemometr Intell Lab Syst 9:273–286
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD 1996). AAAI Press, Portland, Oregon, USA, pp 226–231
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Brodley CE (ed) Machine learning - proceedings of the 21st international conference (ICML 2004), ACM, Banff, Alberta, Canada, ACM International Conference Proceeding Series, vol 69
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Fred A (2001) Finding consistent clusters in data partitions. In: Kittler J, Roli F (eds) Multiple classifier systems - proceedings 2nd international workshop (MCS 2001). Lecture Notes in Computer Science, vol 2096. Springer, Cambridge, pp 309–318
Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Fred A, Jain A (2008) Cluster validation using a probabilistic attributed graph. In: 19th international conference on pattern recognition (ICPR 2008), IEEE, Tampa, Florida, USA, pp 1–4
Fred A, Leitão J (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Anal Mach Intell 25(8):944–958
Fred A, Lourenço A, Aidos H, Bulò SR, Rebagliati N, Figueiredo M, Pelillo M (2013) Similarity-based pattern analysis and recognition, chap Learning similarities from examples under the evidence accumulation clustering paradigm. Springer, New York, pp 85–117
Gowda KC, Ravi TV (1995) Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recogn 28(8):1277–1282
Guha S, Rastogi R, Shim K (1998) CURE: An efficient clustering algorithm for large datasets. In: Haas LM, Tiwary A (eds) Proceedins of the ACM SIGMOD international conference of management of data (SIGMOD 1998). ACM Press, Seattle, Washington, USA, pp 73–84
Guha S, Rastogi R, Shim K (1999) ROCK: A robust clustering algorithm for categorical attributes. In: Kitsuregawa M, Papazoglou MP, Pu C (eds) Proceedings of the 15th international conference on data engineering (ICDE 1999). IEEE Computer Society, Sydney, Australia, pp 512–521
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J Roy Stat Soc C (Appl Stat) 28(1):100–108
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: A review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions. Applied probability and statistics, vol 1, 2nd edn. Wiley, New York
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Kamvar S, Klein D, Manning C (2002) Interpreting and extending classical agglomerative clustering algorithms using a model-based approach. In: Sammut C, Hoffmann AG (eds) Machine learning - proceedings of the 19th international conference (ICML 2002). Morgan Kaufmann, Sydney, Australia, pp 283–290
Kannan R, Vempala S, Vetta A (2000) On clusterings – good, bad and spectral. In: 41st annual symposium on foundations of computer science (FOCS 2000). IEEE Computer Science, Redondo Beach, CA, USA, pp 367–377
Karypis G, Han EH, Kumar V (1999) Chameleon: Hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Kaufman L, Rousseeuw PJ (2005) Finding groups in data: an introduction to cluster analysis. Wiley
Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: Proceedings of the IEEE international conference on systems, man & cybernetics, vol 2. IEEE, The Hague, Netherlands, pp 1214–1219
Lance GN, Williams WT (1968) Note on a new information-statistic classificatory program. Comput J 11:195
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Information science and statistics. Springer, New York
Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer texts in statistics. Springer, New York
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inform Theory 37(1):145–151
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
MacKay DJ (2006) Information theory, inference, and learning algorithms, 5th edn. Cambridge University Press, Cambridge
MacNaughton-Smith P, Williams WT, Dale MB, Mockett LG (1964) Dissimilarity analysis: a new technique of hierarchical sub-division. Nature 202:1034–1035
Meila M (2003) Comparing clusterings by the variation of information. In: Schölkopf B, Warmuth MK (eds) Computational learning theory and kernel machines - proceedings 16th annual conference on computational learning theory and 7th kernel workshop (COLT 2003). Lecture Notes in Computer Science, vol 2777. Springer, Washington, USA, pp 173–187
Milligan GW, Soon SC, Sokol LM (1983) The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Trans Pattern Anal Mach Intell PAMI-5(1):40–47
Olver FWJ, Lozier DW, Boisvert RF, Clark CW (eds) (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Discov 2:169–194
Strehl A, Ghosh J (2002) Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Su T, Dy JG (2007) In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell Data Anal 11(4):319–338
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier Academic, Amsterdam
Topchy A, Jain A, Punch W (2003) Combining multiple weak clusterings. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003). IEEE Computer Society, Melbourne, Florida, USA, pp 331–338
Topchy A, Jain A, Punch W (2005) Clustering ensemble: Models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12(9):2109–2128
Vaithyanathan S, Dom B (2000) Model-based hierarchical clustering. In: Boutilier C, Goldszmidt M (eds) Proceedings of the 16th conference in uncertainty in artificial intelligence (UAI 2000). Morgan Kaufmann, Stanford, California, USA, pp 599–608
Wang H, Shan H, Banerjee A (2009) Bayesian cluster ensembles. In: Proceedings of the SIAM international conference on data mining (SDM 2009). SIAM, Sparks, Nevada, USA, pp 211–222
Wang P, Domeniconi C, Laskey KB (2010) Nonparametric Bayesian clustering ensembles. In: Balcázar JL, Bonchi F, Gionis A, Sebag M (eds) Machine learning and knowledge discovery in databases - proceedings european conference: Part III (ECML PKDD 2010). Lecture notes in computer science, vol 6323. Springer, Barcelona, Spain, pp 435–450
Williams WT, Lambert JM (1959) Multivariate methods in plant ecology: 1. association-analysis in plant communities. J Ecol 47(1):83–101
Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Network 16(3):645–678
Acknowledgements
This work was supported by the Portuguese Foundation for Science and Technology grant PTDC/EEI-SII/2312/2012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Assume that X = {x 1, …, x N } is a l-dimensional set of patterns, and \(\mathbf{x}_{i} \sim \mathcal{N}(\boldsymbol{\mu },\varSigma )\). Also, with no loss of generality, assume that \(\boldsymbol{\mu }= 0\) and Σ is diagonal; this only involves translation and rotation of the data, which does not affect Euclidean distances. If x denotes a sample from this Gaussian, we transform x into x ∗ through a process known as “whitening” or “sphering”, such that its i-th entry is given by \(x_{i}^{{\ast}}\equiv x_{i}/\varSigma _{ii}\); x i ∗ thus follows the standard normal distribution, \(\mathcal{N}(0,1)\). Now, it is known that the difference between samples, such as \(x_{i}^{{\ast}}- y_{i}^{{\ast}}\), from two univariate standard normal distributions follows a normal distribution with covariance 2. Therefore,
It can be shown that the squared Euclidean distance,
i.e., follows a chi-square distribution with l degrees of freedom [28]. Thus, the probability density function for \((d^{{\ast}})^{2} \equiv (d^{{\ast}}(\cdot,\cdot ))^{2}\) is given by:
where Γ(⋅ ) denotes the gamma function.
After the sphering, the transformed data has circular symmetry in \(\mathbb{R}^{l}\). We define angular coordinates in a (l − 1)-sphere, with θ i ∈ [0, π[, i = 1, …, l − 2 and θ l−1 ∈ [0, 2π[. Define \(\mathbf{x} -\mathbf{y} \equiv \left (b_{1},b_{2},\ldots,b_{l}\right )\), where b i can be expressed in terms of polar coordinates as
The squared Euclidean distance in the original space, d 2, is
where A(Θ), with Θ = (θ 1, θ 2, …, θ l−1), is called the expansion factor. Naturally, this expansion factor depends on the angle vector Θ, and it is hard to properly deal with this dependence. Thus, we will use the approximation that the expansion factor is constant and equal to the expected value of the true expansion factor over all angles Θ. This expected value is given by
where the volume element is \(\mathrm{d}_{S^{l-1}}V = \left (\prod _{i=1}^{l-2}\sin ^{l-(i+1)}\theta _{i}\right )\,\mathrm{d}\theta _{1}\ldots \,\mathrm{d}\theta _{l-1}\). Since we sphered the data, we can assume for simplicity that \(\theta _{i} \sim Unif([0,\pi [)\) for i = 1, …, l − 2 and that \(\theta _{l-1} \sim Unif([0,2\pi [)\); then \(p_{\theta _{i}}(\theta _{i}) = 1/\pi\) and \(p_{\theta _{l-1}}(\theta _{l-1}) = 1/2\pi\). Thus, after some computations (see [2] for details), the expected value of the true expansion factor over all angles Θ is given by
Using Eq. (10.35), the transformation Eq. (10.33) from the normalized space to the original space is given by
Assume that Y = aX, a constant, with p X (x) the probability density function of X, so \(p_{Y }(y) = p_{X}(y/a)\mathrm{d}x/\mathrm{d}y = p_{X}(y/a) \cdot 1/a\). From Eq. (10.32), we obtain the probability density function for the squared Euclidean distance in the original space, \(\left (d\right )^{2}\). Again, assuming that Y 2 = X and p X (x) the probability density function of X, we have \(p_{Y }(y) = p_{X}(y^{2})\mathrm{d}x/\mathrm{d}y = p_{X}(y^{2}) \cdot 2y\). Therefore, we obtain the probability density function of the Euclidean distance, \(d \equiv d(\mathbf{x},\mathbf{y})\), as
where we define
and
Now, the dissimilarity increment is defined as the absolute value of the difference of two Euclidean distances. Define d ≡ d(x, y) and d ′ ≡ d(y, z), which follow the distribution in Eq. (10.37). The probability density function of d inc = d − d ′ is given by the convolution
Therefore, after some calculations (see [2] for details), the probability density function for the dissimilarity increments is given by
where Γ(a, x) is the incomplete gamma function [44].
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aidos, H., Fred, A. (2015). Consensus of Clusterings Based on High-Order Dissimilarities. In: Celebi, M. (eds) Partitional Clustering Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-09259-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-09259-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09258-4
Online ISBN: 978-3-319-09259-1
eBook Packages: EngineeringEngineering (R0)