Abstract
Data analysis often requires the unsupervised partitioning of the data set into clusters. Clustering data is an important but a difficult problem. In the absence of prior knowledge about the shape of the clusters, similarity measures for a clustering technique are hard to specify. In this work, we propose a framework that learns from the structure of the data. Learning is accomplished by applying the K-means algorithm multiple times with varying initial centers on the data via entropy minimization. The result is an expected number of clusters and a new similarity measure matrix that gives the proportion of occurrence between each pair of patterns. Using the expected number of clusters, final clustering of data is obtained by clustering a sparse graph of this matrix.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R.O. Duba and P.E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience, New York, NY, 1974.
S. Fang, J.R. Rajasekera, and H.-S. J. Tsao. Entropy Optimization and Mathematical Programming. Kluwer Academic Publishers, 1997.
M. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3): 381–396, 2002.
K. Frenken. Entropy Statistics and Information Theory. In H. Hanusch and A. Pyka, editors, The Elgar Companion to Neo-Schumpeterian Economics. Edward Elgar Publishing (in press).
D. Hall and G Ball. ISODATA: A Novel Method of Data Analysis and Pattern Classification. Technical Report, Stanford Research Institute, Menlo Park, CA, 1965.
G. Iyengar, and A. Lippman. Clustering Images using Relative Entropy for Efficient retrieval. IEEE Computer Magazine, 28(9): 23–32, 1995.
A. Jain and M. Kamber. Algorithms for Clustering. Prentice Hall, 1998.
M. James. Classification Algorithms. Wiley-Interscience, New York, NY, 1985.
T. Kanungo, D.M. Mount, N.S. Netayahu, CD. Piako, R. Silverman, and A.Y. Wu. An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 881–892, 2002.
J.N. Kapur and H.K. Kesaven. Entropy Optimization Principle with Applications, Ch.l. London Academic, 1997.
Y.W. Lim and S.U. Lee. On the Color Image Segmentation Algorithm based on Thresholding and Fuzzy C-means Techniques. Pattern Recognition, 23: 935–952, 1990.
J.B. McQueen. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fitfth Symposium on Math, Statistics, and Probability, pages 281–297. University of California Press, Berkeley, CA, 1967.
D. Miller, A. Rao, K. Rose, and A. Gersho. An Information Theoretic Framework for Optimization with Application to Supervised Learning. IEEE International Symposium on Information Theory, Whistler, B.C., Canada, September 1995.
B. Mirkin. Mathematical Classification and Clustering — Nonconvex Optimization and its Applications, v11. Kluwer Academic Publishers, 1996.
D. Ren. An Adaptive Nearest Neighbor Classification Algorithm. Available at www.cs.ndsu.nodak.edu/ dren/papers/CS785finalPaper.doc
J. Rissanen. A Universal prior for integers and Estimation by Minimum Description Length. Annals of Statistics, 11(2): 416–431, 1983.
J.T. Tou and R.C. Gonzalez. Pattern Recognition Principles. Addison-Wesley, 1994.
M.M. Trivedi and J.C. Bezdeck. Low-level segmentation of aerial with fuzzy clustering. IEEE Transactions on Systems, Man, and Cybernetics, SMC-16: 589–598, 1986.
H. Neemuchawala, A. Hero, and P. Carson. Image Registration using en-tropic graph-matching criteria. Proceedings of Asilomar Conference on Signals, Systems and Computers, 2002.
R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.
N. Wu. The Method of Maximum Entropy. Springer, 1997.
C.L. Blake and C.J. Merz. UCI Repository of machine learning databases http://www.ics.uci.edu/ mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Okafor, A., Pardalos, P., Ragle, M. (2007). Data Mining Via Entropy and Graph Clustering. In: Pardalos, P.M., Boginski, V.L., Vazacopoulos, A. (eds) Data Mining in Biomedicine. Springer Optimization and Its Applications, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69319-4_7
Download citation
DOI: https://doi.org/10.1007/978-0-387-69319-4_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69318-7
Online ISBN: 978-0-387-69319-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)