Abstract
Cluster validity indexes are very important tools designed for two purposes: comparing the performance of clustering algorithms and determining the number of clusters that best fits the data. These indexes are in general constructed by combining a measure of compactness and a measure of separation. A classical measure of compactness is the variance. As for separation, the distance between cluster centers is used. However, such a distance does not always reflect the quality of the partition between clusters and sometimes gives misleading results. In this paper, we propose a new cluster validity index for which Jeffrey divergence is used to measure separation between clusters. Experimental results are conducted using different types of data and comparison with widely used cluster validity indexes demonstrates the outperformance of the proposed index.
Similar content being viewed by others
References
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256
Athanasios P (1991) Probability, random variables and stochastic processes, 3rd edn. McGraw-Hill Companies, New York
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bandyopadhyay S, Saha S, Pedrycz W (2011) Use of a fuzzy granulation-degranulation criterion for assessing cluster validity. Fuzzy Sets Syst 170:22–42
Bezdek JC (1974) Cluster validity with fuzzy sets. J Cybernet 3:58–73
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York and London
Chang H, Yao Y, Koschan A, Abidi BR, Abidi MA (2009) Improving face recognition via narrowband spectral range selection using jeffrey divergence. IEEE Trans Inf Forensics Secur 4(1):111–122
Chen MY, Linkens D (2004) Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy Sets Syst 142:243–265
Deza Marie M, Deza Elena (2009) Encyclopedia of distances. Springer, Heidelberg New York Dordrecht London
Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Ester M, Peter Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press, Palo Alto
Everitt BS, Landau S, Leese M, Stahl D (2011) An introduction to classification and clustering, chap 1:1–13. Wiley, New York
Fränti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recogn 39(5):761–775
Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Goldberger J, Hinton GE, Roweis ST, Salakhutdinov R (2005) Neighbourhood components analysis. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 513–520
Gurrutxaga I, Muguerza J, Arbelaitz O, Pérez JM, Martín JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn Lett 32:505–515
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Englewood Cliffs, New York
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666
Krooshof PW, Postma GJ, Melssen WJ, Buydens LM (2012) Biomedical imaging: principles and applications, chap 12:1–29. Wiley, New York
Lee K, Ho J, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501
Pakhira MK, Bandyopadhyay S, Maulik U (2005) A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification. Fuzzy Sets Syst 155:191–214
Pascual D, Pla F, Snchez JS (2010) Cluster validation using information stability measures. Pattern Recogn Lett 31(6):454–461
Puzicha J, Hofmann T, Buhmann J (1997) Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. Proc IEEE Conf Comput Vis Pattern Recogn 1997:267–272
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Tran TN, Wehrens R, Buydens LM (2005) Clustering multispectral images: a tutorial. Chemom Intell Lab Syst 77:3–17
Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24:1273–1280
Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158:2095–2117
Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In. In NIPS, MIT Press, Cambridge
Wu KL, Yang MS, Hsieh JN (2009) Robust cluster validity indexes. Pattern Recogn 42(11):2541–2550
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with aplicationt o clustering with side-information. In: Advances in neural information processing systems 15, MIT Press, Cambridge, pp 505–512
Žalik KR (2010) Cluster validity index for estimation of fuzzy clusters of different sizes and densities. Pattern Recogn 43(10):3374–3390
Žalik KR, Žalik B (2011) Validity index for clusters of different sizes and densities. Pattern Recogn Lett 32:221–234
Zheng J, You H (2013) A new model-independent method for change detection in multitemporal sar images based on radon transform and jeffrey divergence. IEEE Geosci Remote Sens Lett 10(1):91–95
Acknowledgments
This publication was made possible by NPRP Grant # 4-1165- 2-453 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: parameters estimation
In all the expressions below, x is a vector of random variables whose mean vector and covariance matrix are given by: \(E(x) = \mu\) and \(E((x-\mu )(x-\mu )^{T}) = \Sigma\) where \(E\) means the expectation. Using matrix properties:
-
\(\frac{\partial {\rm A}^{T}\cdot x}{\partial x}=\frac{\partial x^{T}\cdot {\rm A}}{\partial x}= \mathrm{A}\)
-
\(\frac{\partial x^{T}\cdot \mathrm{A}\cdot x}{\partial x}=\mathrm{A}^{T}+\mathrm{A}\)
-
\(\frac{\partial }{\partial \mathrm{A}} {\text {log}} |\mathrm{A}|= \left( \mathrm{A}^{-1}\right) ^{T}\)
-
\(\frac{\partial }{\partial \mathrm{A}} {\text {tr}} \left[ \mathrm{AB}\right] ={\text {tr}} \left[ \mathrm{BA}\right] =\mathrm{B}^{T}\)
The log likelihood of the multivariate Gaussian distribution is given by:
The estimates of the mean and covariance matrix are determined by computing the derivatives of \(L(x|\mu ,\Sigma )\) with relative to \(\mu\) and \(\Sigma\) and set it equal to zero.
Appendix B: Jeffrey divergence for multivariate Gaussian distribution
We use the following formula:
-
\(E(x^{T}\cdot \mathrm{A} \cdot x)= {\text {tr}}(\mathrm{A}\cdot \Sigma )+ \mu ^{T}\cdot \mathrm{A}\cdot \mu\)
-
\(\left\langle \cdot \right\rangle\) is the expectation symbol.
Where KL is the Kullback-Leiber divergence.
Thus:
Appendix C: Similarity measures
The four similarity measures we have used are described in this section. Let \(P1\) and \(P2\) two partitions. We define \(a\) as the number of object pairs that belong to the same clusters in both \(P1\) and \(P2\). Let \(b\) be the number of object pairs that belongs to different clusters in both pairs. Let \(c\) be the number of object pairs that belong to the same clusters in \(P1\) but in different clusters in \(P2\). Finally, let \(d\) the number of pairs that belong to different clusters in \(P1\) but belong to the same cluster in \(P2\). The four similarity measures are defined as:
Rights and permissions
About this article
Cite this article
Said, A.B., Hadjidj, R. & Foufou, S. Cluster validity index based on Jeffrey divergence. Pattern Anal Applic 20, 21–31 (2017). https://doi.org/10.1007/s10044-015-0453-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0453-7