Skip to main content

Using the Negentropy Increment to Determine the Number of Clusters

  • Conference paper
Bio-Inspired Systems: Computational and Ambient Intelligence (IWANN 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5517))

Included in the following conference series:

Abstract

We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. A normal cluster is optimal in the sense of maximum uncertainty, or minimum structure, and so performing further partitions on it will not reveal additional substructures. To characterize the normality of a cluster we use the negentropy, a standard measure of distance to normality which evaluates the difference between the cluster’s entropy and the entropy of a normal distribution with the same covariance matrix. Although the definition of the negentropy involves the differential entropy, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution. The resulting negentropy increment validity index only requires the computation of determinants of covariance matrices. We have applied the index to randomly generated problems, and show that it provides better results than other indices for the assessment of the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Hodder Arnold, London (2001)

    MATH  Google Scholar 

  2. Gordon, A.D.: Cluster Validation. In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.H., Baba, Y. (eds.) Data Science, Classification and Related Methods, pp. 22–39. Springer, New York (1998)

    Chapter  Google Scholar 

  3. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, New York (1991)

    Book  MATH  Google Scholar 

  4. Comon, P.: Independent Component Analysis, a New Concept? Signal Processing 36(3), 287–314 (1994)

    Article  MATH  Google Scholar 

  5. Hyvärinen, A.: New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit. Technical Report A47, Dept. of Computer Science and Engineering and Laboratory of Computer and Information Science, Helsinki Univ. of Technology (1997)

    Google Scholar 

  6. Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 224–227 (1979)

    Article  Google Scholar 

  7. Bezdek, J.C., Pal, R.N.: Some New Indexes of Cluster Validity. IEEE Trans. Systems, Man and Cybernetics B 28(3), 301–315 (1998)

    Article  Google Scholar 

  8. Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. J. Cybernetics 3(3), 32–57 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  9. Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity Index for Crisp and Fuzzy Clusters. Pattern Recognition 37(3), 487–501 (2004)

    Article  MATH  Google Scholar 

  10. Levine, D.: PGAPack Parallel Genetic Algorithm Library, http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html

  11. Bezdek, J.C., Li, W.Q., Attikiouzel, Y., Windham, M.: A Geometric Approach to Cluster Validity for Normal Mixtures. Soft Computing 1, 166–179 (1997)

    Article  Google Scholar 

  12. Biernacki, C., Celeux, G., Govaert, G.: An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model. Pattern Recognition Letters 20(3), 267–272 (1999)

    Article  MATH  Google Scholar 

  13. Bozdogan, H.: Choosing the Number of Component Clusters in the Mixture-Model Using a New Information Complexity Criterion of the Inverse-Fisher Information Matrix. In: Opitz, O., Lausen, B., Klar, R. (eds.) Data Analysis and Knowledge Organization, pp. 40–54. Springer, Heidelberg (1993)

    Google Scholar 

  14. Figueiredo, M.A.T., Jain, A.K.: Unsupervised Learning of Finite Mixture Models. IEEE Trans. Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)

    Article  Google Scholar 

  15. Rasmussen, C.: The Infinite Gaussian Mixture Model. In: Solla, S., Leen, T., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 554–560. MIT Press, Cambridge (2000)

    Google Scholar 

  16. Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J. Computational and Graphical Statistics 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  17. Richardson, S., Green, P.: On Bayesian Analysis of Mixtures with Unknown Number of Components. J. Royal Statistical Soc. B 59, 731–792 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  18. Geva, A.B., Steinberg, Y., Bruckmair, S., Nahum, G.: A Comparison of Cluster Validity Criteria for a Mixture of Normal Distributed Data. Pattern Recognition Letters 21(6-7), 511–529 (2000)

    Article  Google Scholar 

  19. Ciaramella, A., Longo, G., Staiano, A., Tagliaferri, R.: NEC: A Hierarchical Agglomerative Clustering Based on Fisher and Negentropy Information. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds.) WIRN 2005 and NAIS 2005. LNCS, vol. 3931, pp. 49–56. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lago-Fernández, L.F., Corbacho, F. (2009). Using the Negentropy Increment to Determine the Number of Clusters. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds) Bio-Inspired Systems: Computational and Ambient Intelligence. IWANN 2009. Lecture Notes in Computer Science, vol 5517. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02478-8_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02478-8_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02477-1

  • Online ISBN: 978-3-642-02478-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics