Skip to main content
Log in

Cluster validity index based on Jeffrey divergence

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Cluster validity indexes are very important tools designed for two purposes: comparing the performance of clustering algorithms and determining the number of clusters that best fits the data. These indexes are in general constructed by combining a measure of compactness and a measure of separation. A classical measure of compactness is the variance. As for separation, the distance between cluster centers is used. However, such a distance does not always reflect the quality of the partition between clusters and sometimes gives misleading results. In this paper, we propose a new cluster validity index for which Jeffrey divergence is used to measure separation between clusters. Experimental results are conducted using different types of data and comparison with widely used cluster validity indexes demonstrates the outperformance of the proposed index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://ida.first.fraunhofer.de/projects/bench/benchmarks.htm.

  2. http://www.vision.jhu.edu/data/hopkins155/.

References

  1. Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256

    Article  Google Scholar 

  2. Athanasios P (1991) Probability, random variables and stochastic processes, 3rd edn. McGraw-Hill Companies, New York

    Google Scholar 

  3. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  4. Bandyopadhyay S, Saha S, Pedrycz W (2011) Use of a fuzzy granulation-degranulation criterion for assessing cluster validity. Fuzzy Sets Syst 170:22–42

    Article  Google Scholar 

  5. Bezdek JC (1974) Cluster validity with fuzzy sets. J Cybernet 3:58–73

    Article  MathSciNet  MATH  Google Scholar 

  6. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York and London

    Book  MATH  Google Scholar 

  7. Chang H, Yao Y, Koschan A, Abidi BR, Abidi MA (2009) Improving face recognition via narrowband spectral range selection using jeffrey divergence. IEEE Trans Inf Forensics Secur 4(1):111–122

    Article  Google Scholar 

  8. Chen MY, Linkens D (2004) Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy Sets Syst 142:243–265

    Article  MathSciNet  MATH  Google Scholar 

  9. Deza Marie M, Deza Elena (2009) Encyclopedia of distances. Springer, Heidelberg New York Dordrecht London

    Book  MATH  Google Scholar 

  10. Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781

    Article  Google Scholar 

  11. Ester M, Peter Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press, Palo Alto

    Google Scholar 

  12. Everitt BS, Landau S, Leese M, Stahl D (2011) An introduction to classification and clustering, chap 1:1–13. Wiley, New York

    Google Scholar 

  13. Fränti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recogn 39(5):761–775

    Article  MATH  Google Scholar 

  14. Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

    Article  Google Scholar 

  15. Goldberger J, Hinton GE, Roweis ST, Salakhutdinov R (2005) Neighbourhood components analysis. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 513–520

    Google Scholar 

  16. Gurrutxaga I, Muguerza J, Arbelaitz O, Pérez JM, Martín JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn Lett 32:505–515

    Article  Google Scholar 

  17. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Englewood Cliffs, New York

    MATH  Google Scholar 

  18. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666

    Article  Google Scholar 

  19. Krooshof PW, Postma GJ, Melssen WJ, Buydens LM (2012) Biomedical imaging: principles and applications, chap 12:1–29. Wiley, New York

    Google Scholar 

  20. Lee K, Ho J, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Article  Google Scholar 

  21. Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349

    Article  MATH  Google Scholar 

  22. Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501

    Article  MATH  Google Scholar 

  23. Pakhira MK, Bandyopadhyay S, Maulik U (2005) A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification. Fuzzy Sets Syst 155:191–214

    Article  MathSciNet  Google Scholar 

  24. Pascual D, Pla F, Snchez JS (2010) Cluster validation using information stability measures. Pattern Recogn Lett 31(6):454–461

    Article  Google Scholar 

  25. Puzicha J, Hofmann T, Buhmann J (1997) Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. Proc IEEE Conf Comput Vis Pattern Recogn 1997:267–272

    Article  Google Scholar 

  26. Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061

    MATH  Google Scholar 

  27. Tran TN, Wehrens R, Buydens LM (2005) Clustering multispectral images: a tutorial. Chemom Intell Lab Syst 77:3–17

    Article  Google Scholar 

  28. Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24:1273–1280

    Article  Google Scholar 

  29. Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158:2095–2117

    Article  MathSciNet  MATH  Google Scholar 

  30. Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In. In NIPS, MIT Press, Cambridge

  31. Wu KL, Yang MS, Hsieh JN (2009) Robust cluster validity indexes. Pattern Recogn 42(11):2541–2550

    Article  MATH  Google Scholar 

  32. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847

    Article  Google Scholar 

  33. Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with aplicationt o clustering with side-information. In: Advances in neural information processing systems 15, MIT Press, Cambridge, pp 505–512

  34. Žalik KR (2010) Cluster validity index for estimation of fuzzy clusters of different sizes and densities. Pattern Recogn 43(10):3374–3390

    Article  MATH  Google Scholar 

  35. Žalik KR, Žalik B (2011) Validity index for clusters of different sizes and densities. Pattern Recogn Lett 32:221–234

    Article  MATH  Google Scholar 

  36. Zheng J, You H (2013) A new model-independent method for change detection in multitemporal sar images based on radon transform and jeffrey divergence. IEEE Geosci Remote Sens Lett 10(1):91–95

    Article  Google Scholar 

Download references

Acknowledgments

This publication was made possible by NPRP Grant # 4-1165- 2-453 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Ben Said.

Appendices

Appendix A: parameters estimation

In all the expressions below, x is a vector of random variables whose mean vector and covariance matrix are given by: \(E(x) = \mu\) and \(E((x-\mu )(x-\mu )^{T}) = \Sigma\) where \(E\) means the expectation. Using matrix properties:

  • \(\frac{\partial {\rm A}^{T}\cdot x}{\partial x}=\frac{\partial x^{T}\cdot {\rm A}}{\partial x}= \mathrm{A}\)

  • \(\frac{\partial x^{T}\cdot \mathrm{A}\cdot x}{\partial x}=\mathrm{A}^{T}+\mathrm{A}\)

  • \(\frac{\partial }{\partial \mathrm{A}} {\text {log}} |\mathrm{A}|= \left( \mathrm{A}^{-1}\right) ^{T}\)

  • \(\frac{\partial }{\partial \mathrm{A}} {\text {tr}} \left[ \mathrm{AB}\right] ={\text {tr}} \left[ \mathrm{BA}\right] =\mathrm{B}^{T}\)

The log likelihood of the multivariate Gaussian distribution is given by:

$$\begin{aligned} L(x|\mu ,\Sigma )&= \frac{-nD}{2}{\text {log}}(2\pi ) - \frac{n}{2}{\text {log}}(|\Sigma |) \nonumber \\&\quad -\frac{1}{2} \sum _{i=1}^{n}(x_{i}-\mu )^{T} \Sigma ^{-1}(x_{i}-\mu ) \end{aligned}$$
(27)

The estimates of the mean and covariance matrix are determined by computing the derivatives of \(L(x|\mu ,\Sigma )\) with relative to \(\mu\) and \(\Sigma\) and set it equal to zero.

$$\begin{aligned} \frac{\partial L(x|\mu ,\Sigma )}{\partial \mu }&= \frac{\partial }{\partial \mu } \left( \sum _{n}^{i=1}(x_{i}-\mu )^{T}\Sigma ^{-1}(x_{i}-\mu )\right) \\&= \frac{\partial }{\partial \mu } \left( \sum _{i=1}^{n} \left( x_{i}^{T}\Sigma ^{-1}x_{i} - \mu ^{T}\Sigma ^{-1}x_{i} - x_{i}^{T}\Sigma ^{-1}\mu \right. \right. \\&\quad +\left. \left. \mu ^{T} \Sigma ^{-1}\mu \right) \right) \\&= \sum _{i=1}^{n}\left( \Sigma ^{-1}x_{i}+ \left( \Sigma ^{-1} \right) ^{T}x_{i}\right) \\&\quad -N\left( \Sigma ^{-1}+\left( \Sigma ^{-1} \right) ^{T}\right) \mu \\&= 0\\&\Rightarrow \hat{\mu }=\frac{1}{n}\sum _{i=1}^{n}x_{i}\\ \end{aligned}$$
$$\begin{aligned} \frac{\partial L(x|\mu ,\Sigma )}{\partial \Sigma ^{-1}}&= \frac{\partial }{\partial \Sigma ^{-1} } \left( -\frac{N}{2}{\text {log}}(|\Sigma |)\right. \\&\left. \quad -\frac{1}{2} \sum ^{n}_{i=1} (x_{i}-\mu )^{T}\Sigma ^{-1}(x_{i}-\mu )\right) \\&\propto \frac{\partial }{\partial \Sigma ^{-1} } \left( -\frac{N}{2}{\text {log}}(|\Sigma |) -\frac{1}{2} \sum ^{n}_{i=1}{\text {tr}} \left[ \Sigma ^{-1} (x_{i}-\mu )\cdot \right. \right. \\&\left. \left. \quad (x_{i}-\mu )^{T} \right] \right) \\&= \frac{\partial }{\partial \Sigma ^{-1} } \left( \frac{N}{2}{\text {log}}(|\Sigma ^{-1}|) -\frac{1}{2} {\text {tr}} \left[ \sum ^{n}_{i=1} \Sigma ^{-1} (x_{i}-\mu )\cdot \right. \right. \\&\left. \left. \quad (x_{i}-\mu )^{T} \right] \right) \\&= 0 \\&\Rightarrow \hat{\Sigma }=\sum _{i=1}^{n}(x_{i}- \hat{\mu })(x_{i}- \hat{\mu })^{T} \end{aligned}$$

Appendix B: Jeffrey divergence for multivariate Gaussian distribution

We use the following formula:

  • \(E(x^{T}\cdot \mathrm{A} \cdot x)= {\text {tr}}(\mathrm{A}\cdot \Sigma )+ \mu ^{T}\cdot \mathrm{A}\cdot \mu\)

  • \(\left\langle \cdot \right\rangle\) is the expectation symbol.

$$\begin{aligned}&p(x|\mu _{1},\Sigma _{1})=\frac{1}{(2\pi )^{d/2}|\Sigma _{1}|^{1/2}}{\text {exp}}\left( -\frac{1}{2}(x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1})\right) \\&q(x|\mu _{2},\Sigma _{2})=\frac{1}{(2\pi )^{d/2}|\Sigma _{2}|^{1/2}}{\text {exp}}\left( -\frac{1}{2}(x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2})\right) \\&{\rm JD}(p,q)= {\rm KL}(p/q)+{\rm KL}(q/p) \end{aligned}$$

Where KL is the Kullback-Leiber divergence.

$$\begin{aligned} {\rm KL}(p/q)&= \int {\text {log}}\left( p(x)-q(x)\right) p(x)\\&= \int \left( \frac{1}{2}{\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - \frac{1}{2}(x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1})\right. \\&\quad \left. +\frac{1}{2}(x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2})\right) p(x) \\&= \frac{1}{2} {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - \frac{1}{2} \left\langle (x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1}) \right\rangle \\&\quad + \frac{1}{2}\left\langle (x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2}) \right\rangle \\&= \frac{1}{2} \left( {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - {\text {tr}} \left( \Sigma _{1}^{-1}\Sigma _{1}\right) + {\text {tr}} \left( \Sigma _{2}^{-1}\Sigma _{1}\right) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}\Sigma _{2}^{-1}(\mu _{1}-\mu _{2})\right) \\&= \frac{1}{2} \left( {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) -d + + {\text {tr}} \left( \Sigma _{2}^{-1}\Sigma _{1}\right) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}\Sigma _{2}^{-1}(\mu _{1}-\mu _{2}) \right) \end{aligned}$$

Thus:

$$\begin{aligned} {\rm JD}(p,q)&= \frac{1}{2}\left( {\text {tr}}(\Sigma _{1}^{-1}\Sigma _{2})+{\text {tr}}(\Sigma _{2}^{-1}\Sigma _{1}) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}(\Sigma _{1}^{-1}+\Sigma _{2}^{-1})(\mu _{1}-\mu _{2})\right) -d \end{aligned}$$

Appendix C: Similarity measures

The four similarity measures we have used are described in this section. Let \(P1\) and \(P2\) two partitions. We define \(a\) as the number of object pairs that belong to the same clusters in both \(P1\) and \(P2\). Let \(b\) be the number of object pairs that belongs to different clusters in both pairs. Let \(c\) be the number of object pairs that belong to the same clusters in \(P1\) but in different clusters in \(P2\). Finally, let \(d\) the number of pairs that belong to different clusters in \(P1\) but belong to the same cluster in \(P2\). The four similarity measures are defined as:

$$\begin{aligned} {\text {Rand}}&= \frac{a+b}{a+b+c+d} \\ {\text {Fowlkes-Mallows}}&= \frac{a}{\sqrt{(a+d)(a+c)}} \\ {\text {Jaccard}}&= \frac{a}{a+c+d}\\\ {\text {Adjsuted-Rand}}&= \frac{a-\frac{(a+d)(a+c)}{a+b+c+d}}{\frac{a+b+c+d}{2}-\frac{(a+d)(a+c)}{a+b+c+d}} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Said, A.B., Hadjidj, R. & Foufou, S. Cluster validity index based on Jeffrey divergence. Pattern Anal Applic 20, 21–31 (2017). https://doi.org/10.1007/s10044-015-0453-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0453-7

Keywords

Navigation