Skip to main content
Log in

DK-means: a deterministic K-means clustering algorithm for gene expression analysis

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithms for gene data clustering due to its simplicity and computational efficiency. But, K-means algorithm is highly sensitive to the choice of initial cluster centers. Thus, the algorithm easily gets trapped with local optimum if the initial centers are chosen randomly. This paper proposes a deterministic initialization algorithm for K-means (DK-means) by exploring a set of probable centers through a constrained bi-partitioning approach. The proposed algorithm is compared with classical K-means with random initialization and improved K-means variants such as K-means++ and MinMax algorithms. It is also compared with three deterministic initialization methods. Experimental analysis on gene expression datasets demonstrates that DK-means achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Article  Google Scholar 

  2. Alrabea A, Senthilkumar A, Al-Shalabi H, Bader A (2013) Enhancing k-means algorithm with initial cluster centers derived from data partitioning along the data axis with PCA. J Adv Comput Netw 1(2):137–142

    Article  Google Scholar 

  3. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035

  4. Bianchi FM, Livi L, Rizzi A (2016) Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Anal Appl 19(3):745–763

    Article  MathSciNet  Google Scholar 

  5. Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: Proceedings of 15th international conference on machine learning (ICML), vol 98. pp 91–99

  6. Broad Institute Cancer Program Datasets (2016) http://broadinstitute.org/cgi-bin/cancer/

  7. Celebi ME, Kingravi HA (2012) Deterministic initialization of the k-means algorithm using hierarchical clustering. Int J Pattern Recognit Artif Intell 26(07):1250,018–1–1250,018–25

    Article  MathSciNet  Google Scholar 

  8. Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210

    Article  Google Scholar 

  9. Chavent M, Lechevallier Y, Briant O (2007) Divclus-t: a monothetic divisive hierarchical clustering method. Comput Stat Data Anal 52(2):687–701

    Article  MathSciNet  MATH  Google Scholar 

  10. Ding C, He X (2004) K-means clustering via principal component analysis. In: International conference on machine learning (ICML), ACM, pp 29–36

  11. Du Z, Wang Y, Ji Z (2008) PK-means: a new algorithm for gene clustering. Comput Biol Chem 32(4):243–247

    Article  MATH  Google Scholar 

  12. Duwairi R, Abu-Rahmeh M (2015) A novel approach for initializing the spherical k-means clustering algorithm. Simul Modell Pract Theory 54:49–63

    Article  Google Scholar 

  13. Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognit Lett 32(14):1701–1705

    Article  Google Scholar 

  14. Giancarlo R, Utro F (2011) Speeding up the consensus clustering methodology for microarray data analysis. Algorithms Mol Biol 6(1):1–13

    Article  Google Scholar 

  15. Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30

    Article  Google Scholar 

  16. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2):107–145

    Article  MATH  Google Scholar 

  17. Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesirov JP (2007) Subclass mapping: identifying common subtypes in independent disease data sets. PloS ONE 2(11):e1195

    Article  Google Scholar 

  18. Jain AK, Law MH (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell 3776:1–10

    Article  Google Scholar 

  19. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  20. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386

    Article  Google Scholar 

  21. Jothi R, Mohanty SK, Ojha A (2016a) Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput Biol Med 71:135–148

    Article  Google Scholar 

  22. Jothi R, Mohanty SK, Ojha A (2016b) On careful selection of initial centers for k-means algorithm. In: Proceedings of 3rd international conference on advanced computing, networking and informatics: ICACNI 2015, Vol 1, Springer India, New Delhi, pp 435–445

  23. Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38(3):283–293

    Article  Google Scholar 

  24. Khan SS, Ahmad A (2004) Cluster center initialization algorithm for k-means clustering. Pattern Recognit Lett 25(11):1293–1302

    Article  Google Scholar 

  25. Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B: Cybern 29(3):433–439

    Article  Google Scholar 

  26. Lam YK, Tsang PW (2012) eXploratory k-means: a new simple and efficient algorithm for gene clustering. Appl Soft Comput 12(3):1149–1157

    Article  Google Scholar 

  27. Lam YK, Tsang PWM, Leung CS (2013) Pso-based k-means clustering with enhanced cluster matching for gene expression data. Neural Comput Appl 22(7–8):1349–1355

    Article  Google Scholar 

  28. Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461

    Article  Google Scholar 

  29. Liu M, Jiang X, Kot AC (2009) A multi-prototype clustering algorithm. Pattern Recognit 42(5):689–698

    Article  MATH  Google Scholar 

  30. Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004a) FGKA: A fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM symposium on Applied computing, ACM, pp 622–623

  31. Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004b) Incremental genetic k-means algorithm and its application in gene expression data analysis. BMC Bioinform 5(1):172–181

    Article  Google Scholar 

  32. Martella F, Vichi M (2012) Clustering microarray data using model-based double k-means. J Appl Stat 39(9):1853–1869

    Article  MathSciNet  Google Scholar 

  33. Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10(1):27–42

    Article  Google Scholar 

  34. Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118

    Article  MATH  Google Scholar 

  35. Nazeer K, Sebastian M, Kumar S (2013) A novel harmony search-k means hybrid algorithm for clustering gene expression data. Bioinformation 9(2):84–88

    Article  Google Scholar 

  36. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  37. Sun J, Chen W, Fang W, Wun X, Xu W (2012) Gene expression data analysis with the clustering method based on an improved quantum-behaved particle swarm optimization. Eng Appl Artif Intell 25(2):376–391

    Article  Google Scholar 

  38. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19):2405–2412

    Article  Google Scholar 

  39. Ting S, Jennifer GD (2007) In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intell Data Anal 11(4):319–338

    Article  Google Scholar 

  40. Tzortzis G, Likas A (2014) The minmax k-means clustering algorithm. Pattern Recognit 47(7):2505–2516

    Article  Google Scholar 

  41. Validating Clustering for Gene Expression Data (2012) http://faculty.washington.edu/kayee/cluster/

  42. Xu R, Wunsch DC (2010) Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng 3:120–154

    Article  Google Scholar 

  43. Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Jothi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jothi, R., Mohanty, S.K. & Ojha, A. DK-means: a deterministic K-means clustering algorithm for gene expression analysis. Pattern Anal Applic 22, 649–667 (2019). https://doi.org/10.1007/s10044-017-0673-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0673-0

Keywords

Navigation