Skip to main content

Data Mining Via Entropy and Graph Clustering

  • Chapter
Data Mining in Biomedicine

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 7))

Abstract

Data analysis often requires the unsupervised partitioning of the data set into clusters. Clustering data is an important but a difficult problem. In the absence of prior knowledge about the shape of the clusters, similarity measures for a clustering technique are hard to specify. In this work, we propose a framework that learns from the structure of the data. Learning is accomplished by applying the K-means algorithm multiple times with varying initial centers on the data via entropy minimization. The result is an expected number of clusters and a new similarity measure matrix that gives the proportion of occurrence between each pair of patterns. Using the expected number of clusters, final clustering of data is obtained by clustering a sparse graph of this matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R.O. Duba and P.E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience, New York, NY, 1974.

    Google Scholar 

  2. S. Fang, J.R. Rajasekera, and H.-S. J. Tsao. Entropy Optimization and Mathematical Programming. Kluwer Academic Publishers, 1997.

    Google Scholar 

  3. M. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3): 381–396, 2002.

    Article  Google Scholar 

  4. K. Frenken. Entropy Statistics and Information Theory. In H. Hanusch and A. Pyka, editors, The Elgar Companion to Neo-Schumpeterian Economics. Edward Elgar Publishing (in press).

    Google Scholar 

  5. D. Hall and G Ball. ISODATA: A Novel Method of Data Analysis and Pattern Classification. Technical Report, Stanford Research Institute, Menlo Park, CA, 1965.

    Google Scholar 

  6. G. Iyengar, and A. Lippman. Clustering Images using Relative Entropy for Efficient retrieval. IEEE Computer Magazine, 28(9): 23–32, 1995.

    Google Scholar 

  7. A. Jain and M. Kamber. Algorithms for Clustering. Prentice Hall, 1998.

    Google Scholar 

  8. M. James. Classification Algorithms. Wiley-Interscience, New York, NY, 1985.

    Google Scholar 

  9. T. Kanungo, D.M. Mount, N.S. Netayahu, CD. Piako, R. Silverman, and A.Y. Wu. An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 881–892, 2002.

    Article  Google Scholar 

  10. J.N. Kapur and H.K. Kesaven. Entropy Optimization Principle with Applications, Ch.l. London Academic, 1997.

    Google Scholar 

  11. Y.W. Lim and S.U. Lee. On the Color Image Segmentation Algorithm based on Thresholding and Fuzzy C-means Techniques. Pattern Recognition, 23: 935–952, 1990.

    Article  Google Scholar 

  12. J.B. McQueen. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fitfth Symposium on Math, Statistics, and Probability, pages 281–297. University of California Press, Berkeley, CA, 1967.

    Google Scholar 

  13. D. Miller, A. Rao, K. Rose, and A. Gersho. An Information Theoretic Framework for Optimization with Application to Supervised Learning. IEEE International Symposium on Information Theory, Whistler, B.C., Canada, September 1995.

    Google Scholar 

  14. B. Mirkin. Mathematical Classification and Clustering — Nonconvex Optimization and its Applications, v11. Kluwer Academic Publishers, 1996.

    Google Scholar 

  15. D. Ren. An Adaptive Nearest Neighbor Classification Algorithm. Available at www.cs.ndsu.nodak.edu/ dren/papers/CS785finalPaper.doc

    Google Scholar 

  16. J. Rissanen. A Universal prior for integers and Estimation by Minimum Description Length. Annals of Statistics, 11(2): 416–431, 1983.

    Article  Google Scholar 

  17. J.T. Tou and R.C. Gonzalez. Pattern Recognition Principles. Addison-Wesley, 1994.

    Google Scholar 

  18. M.M. Trivedi and J.C. Bezdeck. Low-level segmentation of aerial with fuzzy clustering. IEEE Transactions on Systems, Man, and Cybernetics, SMC-16: 589–598, 1986.

    Article  Google Scholar 

  19. H. Neemuchawala, A. Hero, and P. Carson. Image Registration using en-tropic graph-matching criteria. Proceedings of Asilomar Conference on Signals, Systems and Computers, 2002.

    Google Scholar 

  20. R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.

    Google Scholar 

  21. N. Wu. The Method of Maximum Entropy. Springer, 1997.

    Google Scholar 

  22. C.L. Blake and C.J. Merz. UCI Repository of machine learning databases http://www.ics.uci.edu/ mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Okafor, A., Pardalos, P., Ragle, M. (2007). Data Mining Via Entropy and Graph Clustering. In: Pardalos, P.M., Boginski, V.L., Vazacopoulos, A. (eds) Data Mining in Biomedicine. Springer Optimization and Its Applications, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69319-4_7

Download citation

Publish with us

Policies and ethics