Cluster validity index based on Jeffrey divergence

Said, Ahmed Ben; Hadjidj, Rachid; Foufou, Sebti

doi:10.1007/s10044-015-0453-7

Cluster validity index based on Jeffrey divergence

Theoretical Advances
Published: 31 January 2015

Volume 20, pages 21–31, (2017)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ahmed Ben Said^1,2,
Rachid Hadjidj¹ &
Sebti Foufou¹

659 Accesses
13 Citations
2 Altmetric
Explore all metrics

Abstract

Cluster validity indexes are very important tools designed for two purposes: comparing the performance of clustering algorithms and determining the number of clusters that best fits the data. These indexes are in general constructed by combining a measure of compactness and a measure of separation. A classical measure of compactness is the variance. As for separation, the distance between cluster centers is used. However, such a distance does not always reflect the quality of the partition between clusters and sometimes gives misleading results. In this paper, we propose a new cluster validity index for which Jeffrey divergence is used to measure separation between clusters. Experimental results are conducted using different types of data and comparison with widely used cluster validity indexes demonstrates the outperformance of the proposed index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study on k-means Clustering Method and Analysis

What’s in a distance? Exploring the interplay between distance measures and internal cluster validity in multi-objective clustering

Article 22 August 2022

Comparing clusterings using combination of the kappa statistic and entropy-based measure

Article 16 November 2019

Notes

References

Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256
Article Google Scholar
Athanasios P (1991) Probability, random variables and stochastic processes, 3rd edn. McGraw-Hill Companies, New York
Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bandyopadhyay S, Saha S, Pedrycz W (2011) Use of a fuzzy granulation-degranulation criterion for assessing cluster validity. Fuzzy Sets Syst 170:22–42
Article Google Scholar
Bezdek JC (1974) Cluster validity with fuzzy sets. J Cybernet 3:58–73
Article MathSciNet MATH Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York and London
Book MATH Google Scholar
Chang H, Yao Y, Koschan A, Abidi BR, Abidi MA (2009) Improving face recognition via narrowband spectral range selection using jeffrey divergence. IEEE Trans Inf Forensics Secur 4(1):111–122
Article Google Scholar
Chen MY, Linkens D (2004) Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy Sets Syst 142:243–265
Article MathSciNet MATH Google Scholar
Deza Marie M, Deza Elena (2009) Encyclopedia of distances. Springer, Heidelberg New York Dordrecht London
Book MATH Google Scholar
Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Article Google Scholar
Ester M, Peter Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press, Palo Alto
Google Scholar
Everitt BS, Landau S, Leese M, Stahl D (2011) An introduction to classification and clustering, chap 1:1–13. Wiley, New York
Google Scholar
Fränti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recogn 39(5):761–775
Article MATH Google Scholar
Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Article Google Scholar
Goldberger J, Hinton GE, Roweis ST, Salakhutdinov R (2005) Neighbourhood components analysis. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 513–520
Google Scholar
Gurrutxaga I, Muguerza J, Arbelaitz O, Pérez JM, Martín JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn Lett 32:505–515
Article Google Scholar
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Englewood Cliffs, New York
MATH Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666
Article Google Scholar
Krooshof PW, Postma GJ, Melssen WJ, Buydens LM (2012) Biomedical imaging: principles and applications, chap 12:1–29. Wiley, New York
Google Scholar
Lee K, Ho J, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Article Google Scholar
Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349
Article MATH Google Scholar
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501
Article MATH Google Scholar
Pakhira MK, Bandyopadhyay S, Maulik U (2005) A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification. Fuzzy Sets Syst 155:191–214
Article MathSciNet Google Scholar
Pascual D, Pla F, Snchez JS (2010) Cluster validation using information stability measures. Pattern Recogn Lett 31(6):454–461
Article Google Scholar
Puzicha J, Hofmann T, Buhmann J (1997) Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. Proc IEEE Conf Comput Vis Pattern Recogn 1997:267–272
Article Google Scholar
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
MATH Google Scholar
Tran TN, Wehrens R, Buydens LM (2005) Clustering multispectral images: a tutorial. Chemom Intell Lab Syst 77:3–17
Article Google Scholar
Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24:1273–1280
Article Google Scholar
Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158:2095–2117
Article MathSciNet MATH Google Scholar
Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In. In NIPS, MIT Press, Cambridge
Wu KL, Yang MS, Hsieh JN (2009) Robust cluster validity indexes. Pattern Recogn 42(11):2541–2550
Article MATH Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847
Article Google Scholar
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with aplicationt o clustering with side-information. In: Advances in neural information processing systems 15, MIT Press, Cambridge, pp 505–512
Žalik KR (2010) Cluster validity index for estimation of fuzzy clusters of different sizes and densities. Pattern Recogn 43(10):3374–3390
Article MATH Google Scholar
Žalik KR, Žalik B (2011) Validity index for clusters of different sizes and densities. Pattern Recogn Lett 32:221–234
Article MATH Google Scholar
Zheng J, You H (2013) A new model-independent method for change detection in multitemporal sar images based on radon transform and jeffrey divergence. IEEE Geosci Remote Sens Lett 10(1):91–95
Article Google Scholar

Download references

Acknowledgments

This publication was made possible by NPRP Grant # 4-1165- 2-453 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

CSE Department, College of Engineering, Qatar University, P.O. Box 2713, Doha, Qatar
Ahmed Ben Said, Rachid Hadjidj & Sebti Foufou
LE2I Lab, UMR CNRS 6306, University of Burgundy, BP 47870, 21078, Dijon, France
Ahmed Ben Said

Authors

Ahmed Ben Said
View author publications
You can also search for this author in PubMed Google Scholar
Rachid Hadjidj
View author publications
You can also search for this author in PubMed Google Scholar
Sebti Foufou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Ben Said.

Appendices

Appendix A: parameters estimation

In all the expressions below, x is a vector of random variables whose mean vector and covariance matrix are given by: $E(x) = \mu$ and $E((x-\mu )(x-\mu )^{T}) = \Sigma$ where $E$ means the expectation. Using matrix properties:

$\frac{\partial {\rm A}^{T}\cdot x}{\partial x}=\frac{\partial x^{T}\cdot {\rm A}}{\partial x}= \mathrm{A}$
$\frac{\partial x^{T}\cdot \mathrm{A}\cdot x}{\partial x}=\mathrm{A}^{T}+\mathrm{A}$
$\frac{\partial }{\partial \mathrm{A}} {\text {log}} |\mathrm{A}|= \left( \mathrm{A}^{-1}\right) ^{T}$
$\frac{\partial }{\partial \mathrm{A}} {\text {tr}} \left[ \mathrm{AB}\right] ={\text {tr}} \left[ \mathrm{BA}\right] =\mathrm{B}^{T}$

The log likelihood of the multivariate Gaussian distribution is given by:

$$\begin{aligned} L(x|\mu ,\Sigma )&= \frac{-nD}{2}{\text {log}}(2\pi ) - \frac{n}{2}{\text {log}}(|\Sigma |) \nonumber \\&\quad -\frac{1}{2} \sum _{i=1}^{n}(x_{i}-\mu )^{T} \Sigma ^{-1}(x_{i}-\mu ) \end{aligned}$$

(27)

The estimates of the mean and covariance matrix are determined by computing the derivatives of $L(x|\mu ,\Sigma )$ with relative to $\mu$ and $\Sigma$ and set it equal to zero.

$$\begin{aligned} \frac{\partial L(x|\mu ,\Sigma )}{\partial \mu }&= \frac{\partial }{\partial \mu } \left( \sum _{n}^{i=1}(x_{i}-\mu )^{T}\Sigma ^{-1}(x_{i}-\mu )\right) \\&= \frac{\partial }{\partial \mu } \left( \sum _{i=1}^{n} \left( x_{i}^{T}\Sigma ^{-1}x_{i} - \mu ^{T}\Sigma ^{-1}x_{i} - x_{i}^{T}\Sigma ^{-1}\mu \right. \right. \\&\quad +\left. \left. \mu ^{T} \Sigma ^{-1}\mu \right) \right) \\&= \sum _{i=1}^{n}\left( \Sigma ^{-1}x_{i}+ \left( \Sigma ^{-1} \right) ^{T}x_{i}\right) \\&\quad -N\left( \Sigma ^{-1}+\left( \Sigma ^{-1} \right) ^{T}\right) \mu \\&= 0\\&\Rightarrow \hat{\mu }=\frac{1}{n}\sum _{i=1}^{n}x_{i}\\ \end{aligned}$$

$$\begin{aligned} \frac{\partial L(x|\mu ,\Sigma )}{\partial \Sigma ^{-1}}&= \frac{\partial }{\partial \Sigma ^{-1} } \left( -\frac{N}{2}{\text {log}}(|\Sigma |)\right. \\&\left. \quad -\frac{1}{2} \sum ^{n}_{i=1} (x_{i}-\mu )^{T}\Sigma ^{-1}(x_{i}-\mu )\right) \\&\propto \frac{\partial }{\partial \Sigma ^{-1} } \left( -\frac{N}{2}{\text {log}}(|\Sigma |) -\frac{1}{2} \sum ^{n}_{i=1}{\text {tr}} \left[ \Sigma ^{-1} (x_{i}-\mu )\cdot \right. \right. \\&\left. \left. \quad (x_{i}-\mu )^{T} \right] \right) \\&= \frac{\partial }{\partial \Sigma ^{-1} } \left( \frac{N}{2}{\text {log}}(|\Sigma ^{-1}|) -\frac{1}{2} {\text {tr}} \left[ \sum ^{n}_{i=1} \Sigma ^{-1} (x_{i}-\mu )\cdot \right. \right. \\&\left. \left. \quad (x_{i}-\mu )^{T} \right] \right) \\&= 0 \\&\Rightarrow \hat{\Sigma }=\sum _{i=1}^{n}(x_{i}- \hat{\mu })(x_{i}- \hat{\mu })^{T} \end{aligned}$$

Appendix B: Jeffrey divergence for multivariate Gaussian distribution

We use the following formula:

$E(x^{T}\cdot \mathrm{A} \cdot x)= {\text {tr}}(\mathrm{A}\cdot \Sigma )+ \mu ^{T}\cdot \mathrm{A}\cdot \mu$
$\left\langle \cdot \right\rangle$ is the expectation symbol.

$$\begin{aligned}&p(x|\mu _{1},\Sigma _{1})=\frac{1}{(2\pi )^{d/2}|\Sigma _{1}|^{1/2}}{\text {exp}}\left( -\frac{1}{2}(x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1})\right) \\&q(x|\mu _{2},\Sigma _{2})=\frac{1}{(2\pi )^{d/2}|\Sigma _{2}|^{1/2}}{\text {exp}}\left( -\frac{1}{2}(x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2})\right) \\&{\rm JD}(p,q)= {\rm KL}(p/q)+{\rm KL}(q/p) \end{aligned}$$

Where KL is the Kullback-Leiber divergence.

$$\begin{aligned} {\rm KL}(p/q)&= \int {\text {log}}\left( p(x)-q(x)\right) p(x)\\&= \int \left( \frac{1}{2}{\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - \frac{1}{2}(x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1})\right. \\&\quad \left. +\frac{1}{2}(x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2})\right) p(x) \\&= \frac{1}{2} {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - \frac{1}{2} \left\langle (x-\mu _{1})^{T}\Sigma _{1}^{-1}(x-\mu _{1}) \right\rangle \\&\quad + \frac{1}{2}\left\langle (x-\mu _{2})^{T}\Sigma _{2}^{-1}(x-\mu _{2}) \right\rangle \\&= \frac{1}{2} \left( {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) - {\text {tr}} \left( \Sigma _{1}^{-1}\Sigma _{1}\right) + {\text {tr}} \left( \Sigma _{2}^{-1}\Sigma _{1}\right) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}\Sigma _{2}^{-1}(\mu _{1}-\mu _{2})\right) \\&= \frac{1}{2} \left( {\text {log}}\left( \frac{|\Sigma _{2}|}{|\Sigma _{1}|}\right) -d + + {\text {tr}} \left( \Sigma _{2}^{-1}\Sigma _{1}\right) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}\Sigma _{2}^{-1}(\mu _{1}-\mu _{2}) \right) \end{aligned}$$

Thus:

$$\begin{aligned} {\rm JD}(p,q)&= \frac{1}{2}\left( {\text {tr}}(\Sigma _{1}^{-1}\Sigma _{2})+{\text {tr}}(\Sigma _{2}^{-1}\Sigma _{1}) \right) \\&\quad + \frac{1}{2} \left( (\mu _{1}-\mu _{2})^{T}(\Sigma _{1}^{-1}+\Sigma _{2}^{-1})(\mu _{1}-\mu _{2})\right) -d \end{aligned}$$

Appendix C: Similarity measures

The four similarity measures we have used are described in this section. Let $P1$ and $P2$ two partitions. We define $a$ as the number of object pairs that belong to the same clusters in both $P1$ and $P2$. Let $b$ be the number of object pairs that belongs to different clusters in both pairs. Let $c$ be the number of object pairs that belong to the same clusters in $P1$ but in different clusters in $P2$. Finally, let $d$ the number of pairs that belong to different clusters in $P1$ but belong to the same cluster in $P2$. The four similarity measures are defined as:

$$\begin{aligned} {\text {Rand}}&= \frac{a+b}{a+b+c+d} \\ {\text {Fowlkes-Mallows}}&= \frac{a}{\sqrt{(a+d)(a+c)}} \\ {\text {Jaccard}}&= \frac{a}{a+c+d}\\\ {\text {Adjsuted-Rand}}&= \frac{a-\frac{(a+d)(a+c)}{a+b+c+d}}{\frac{a+b+c+d}{2}-\frac{(a+d)(a+c)}{a+b+c+d}} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Said, A.B., Hadjidj, R. & Foufou, S. Cluster validity index based on Jeffrey divergence. Pattern Anal Applic 20, 21–31 (2017). https://doi.org/10.1007/s10044-015-0453-7

Download citation

Received: 13 July 2014
Accepted: 12 January 2015
Published: 31 January 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10044-015-0453-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster validity index based on Jeffrey divergence

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on k-means Clustering Method and Analysis

What’s in a distance? Exploring the interplay between distance measures and internal cluster validity in multi-objective clustering

Comparing clusterings using combination of the kappa statistic and entropy-based measure

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: parameters estimation

Appendix B: Jeffrey divergence for multivariate Gaussian distribution

Appendix C: Similarity measures

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cluster validity index based on Jeffrey divergence

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on k-means Clustering Method and Analysis

What’s in a distance? Exploring the interplay between distance measures and internal cluster validity in multi-objective clustering

Comparing clusterings using combination of the kappa statistic and entropy-based measure

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: parameters estimation

Appendix B: Jeffrey divergence for multivariate Gaussian distribution

Appendix C: Similarity measures

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation