Clustering Biological Data Using Enhanced k-Means Algorithm

  • K. A. Abdul NazeerEmail author
  • M. P. Sebastian
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 60)


With the advent of modern scientific methods for data collection, huge volumes of biological data are now getting accumulated at various data banks. The enormity of such data and the complexity of biological networks greatly increase the challenges of understanding and interpreting the underlying data. Effective and efficient Data Mining techniques are essential to unearth useful information from them. A first step towards addressing this challenge is the use of clustering techniques, which helps to recognize natural groupings and interesting patterns in the data-set under consideration. The classical k-means clustering algorithm is widely used for many practical applications. But it is computationally expensive and the accuracy of the final clusters is not guaranteed always. This paper proposes a heuristic method for improving the accuracy and efficiency of the k-means clustering algorithm. The modified algorithm is then applied for clustering biological data, the results of which are promising.


Data mining clustering k-means algorithm 


  1. 1.
    Daxin J., Chum T., Aidong Z.: Cluster analysis for gene expression data. IEEE Trans. Data Knowl. Eng., 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  2. 2.
    Han, J.: Data mining concepts and techniques. Morgan Kaufmann Publishers, An imprint of Elsevier, San Francisco, CA (2006)Google Scholar
  3. 3.
    McQueen, J.: Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Statist. Prob. (1), 281–297 (1967)Google Scholar
  4. 4.
    Dunham, M.H.: Data Mining-Introductory and Advanced Concepts. Pearson Education (2006)Google Scholar
  5. 5.
    Huang Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. (2), 283–304 (1998)Google Scholar
  6. 6.
    Chaturvedi, J.C.A., Green, P.: K-modes Clustering. J. Classif. (18), 35–55 (2001)Google Scholar
  7. 7.
    Yuan, F., Meng, Z.H., Zhang, H.X., Dong, C.R.: A new algorithm to get the initial centroids. In: Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pp. 26–29, August 2004Google Scholar
  8. 8.
    Fahim, A.M., Salem, A.M., Torkey, A., Ramadan, M.A.: An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ. 10(7), 1626–1633 (2006)CrossRefGoogle Scholar
  9. 9.
    Merz, C., Murphy, P.: UCI Repository of Machine Learning Databases.

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering National Institute of Technology CalicutKozhikodeIndia

Personalised recommendations