Using the Negentropy Increment to Determine the Number of Clusters

Lago-Fernández, Luis F.; Corbacho, Fernando

doi:10.1007/978-3-642-02478-8_56

Luis F. Lago-Fernández²⁰ &
Fernando Corbacho²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5517))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

1652 Accesses
5 Citations

Abstract

We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. A normal cluster is optimal in the sense of maximum uncertainty, or minimum structure, and so performing further partitions on it will not reveal additional substructures. To characterize the normality of a cluster we use the negentropy, a standard measure of distance to normality which evaluates the difference between the cluster’s entropy and the entropy of a normal distribution with the same covariance matrix. Although the definition of the negentropy involves the differential entropy, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution. The resulting negentropy increment validity index only requires the computation of determinants of covariance matrices. We have applied the index to randomly generated problems, and show that it provides better results than other indices for the assessment of the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Hodder Arnold, London (2001)
MATH Google Scholar
Gordon, A.D.: Cluster Validation. In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.H., Baba, Y. (eds.) Data Science, Classification and Related Methods, pp. 22–39. Springer, New York (1998)
Chapter Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, New York (1991)
Book MATH Google Scholar
Comon, P.: Independent Component Analysis, a New Concept? Signal Processing 36(3), 287–314 (1994)
Article MATH Google Scholar
Hyvärinen, A.: New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit. Technical Report A47, Dept. of Computer Science and Engineering and Laboratory of Computer and Information Science, Helsinki Univ. of Technology (1997)
Google Scholar
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 224–227 (1979)
Article Google Scholar
Bezdek, J.C., Pal, R.N.: Some New Indexes of Cluster Validity. IEEE Trans. Systems, Man and Cybernetics B 28(3), 301–315 (1998)
Article Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. J. Cybernetics 3(3), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity Index for Crisp and Fuzzy Clusters. Pattern Recognition 37(3), 487–501 (2004)
Article MATH Google Scholar
Levine, D.: PGAPack Parallel Genetic Algorithm Library, http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html
Bezdek, J.C., Li, W.Q., Attikiouzel, Y., Windham, M.: A Geometric Approach to Cluster Validity for Normal Mixtures. Soft Computing 1, 166–179 (1997)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model. Pattern Recognition Letters 20(3), 267–272 (1999)
Article MATH Google Scholar
Bozdogan, H.: Choosing the Number of Component Clusters in the Mixture-Model Using a New Information Complexity Criterion of the Inverse-Fisher Information Matrix. In: Opitz, O., Lausen, B., Klar, R. (eds.) Data Analysis and Knowledge Organization, pp. 40–54. Springer, Heidelberg (1993)
Google Scholar
Figueiredo, M.A.T., Jain, A.K.: Unsupervised Learning of Finite Mixture Models. IEEE Trans. Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)
Article Google Scholar
Rasmussen, C.: The Infinite Gaussian Mixture Model. In: Solla, S., Leen, T., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 554–560. MIT Press, Cambridge (2000)
Google Scholar
Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J. Computational and Graphical Statistics 9(2), 249–265 (2000)
MathSciNet Google Scholar
Richardson, S., Green, P.: On Bayesian Analysis of Mixtures with Unknown Number of Components. J. Royal Statistical Soc. B 59, 731–792 (1997)
Article MathSciNet MATH Google Scholar
Geva, A.B., Steinberg, Y., Bruckmair, S., Nahum, G.: A Comparison of Cluster Validity Criteria for a Mixture of Normal Distributed Data. Pattern Recognition Letters 21(6-7), 511–529 (2000)
Article Google Scholar
Ciaramella, A., Longo, G., Staiano, A., Tagliaferri, R.: NEC: A Hierarchical Agglomerative Clustering Based on Fisher and Negentropy Information. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds.) WIRN 2005 and NAIS 2005. LNCS, vol. 3931, pp. 49–56. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad Autónoma de Madrid, 28049, Madrid, Spain
Luis F. Lago-Fernández
Cognodata Consulting, Calle Caracas 23, 28010, Madrid, Spain
Fernando Corbacho

Authors

Luis F. Lago-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Corbacho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ingeniería Electrónica, Universitat Politècnica de Catalunya (UPC). E.T.S.I. de Telecomunicación., , , ,, Campus Norte, Edificio C4, C/ Jordi Girona, 1-3, E08034, Barcelona, Spain
Joan Cabestany
Grupo ISIS, Dpto. Tecnología Electrónica ETSI Telecomunicación, Universidad de Málaga, Campus de Teatinos, 29071, Málaga, Spain
Francisco Sandoval
Department of Computer Architecture and Computer Technology, University of Granada, Spain
Alberto Prieto
Department of Informatics, University of Salamanca, Salamanca, Spain
Juan M. Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lago-Fernández, L.F., Corbacho, F. (2009). Using the Negentropy Increment to Determine the Number of Clusters. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds) Bio-Inspired Systems: Computational and Ambient Intelligence. IWANN 2009. Lecture Notes in Computer Science, vol 5517. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02478-8_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-02478-8_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02477-1
Online ISBN: 978-3-642-02478-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics