DK-means: a deterministic K-means clustering algorithm for gene expression analysis

Jothi, R.; Mohanty, Sraban Kumar; Ojha, Aparajita

doi:10.1007/s10044-017-0673-0

DK-means: a deterministic K-means clustering algorithm for gene expression analysis

Theoretical Advances
Published: 28 December 2017

Volume 22, pages 649–667, (2019)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

1783 Accesses
39 Citations
Explore all metrics

Abstract

Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithms for gene data clustering due to its simplicity and computational efficiency. But, K-means algorithm is highly sensitive to the choice of initial cluster centers. Thus, the algorithm easily gets trapped with local optimum if the initial centers are chosen randomly. This paper proposes a deterministic initialization algorithm for K-means (DK-means) by exploring a set of probable centers through a constrained bi-partitioning approach. The proposed algorithm is compared with classical K-means with random initialization and improved K-means variants such as K-means++ and MinMax algorithms. It is also compared with three deterministic initialization methods. Experimental analysis on gene expression datasets demonstrates that DK-means achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RETRACTED ARTICLE: Research on semi supervised K-means clustering algorithm in data mining

Article 09 March 2018

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

Article 07 July 2022

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Article Google Scholar
Alrabea A, Senthilkumar A, Al-Shalabi H, Bader A (2013) Enhancing k-means algorithm with initial cluster centers derived from data partitioning along the data axis with PCA. J Adv Comput Netw 1(2):137–142
Article Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035
Bianchi FM, Livi L, Rizzi A (2016) Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Anal Appl 19(3):745–763
Article MathSciNet Google Scholar
Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: Proceedings of 15th international conference on machine learning (ICML), vol 98. pp 91–99
Broad Institute Cancer Program Datasets (2016) http://broadinstitute.org/cgi-bin/cancer/
Celebi ME, Kingravi HA (2012) Deterministic initialization of the k-means algorithm using hierarchical clustering. Int J Pattern Recognit Artif Intell 26(07):1250,018–1–1250,018–25
Article MathSciNet Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Article Google Scholar
Chavent M, Lechevallier Y, Briant O (2007) Divclus-t: a monothetic divisive hierarchical clustering method. Comput Stat Data Anal 52(2):687–701
Article MathSciNet MATH Google Scholar
Ding C, He X (2004) K-means clustering via principal component analysis. In: International conference on machine learning (ICML), ACM, pp 29–36
Du Z, Wang Y, Ji Z (2008) PK-means: a new algorithm for gene clustering. Comput Biol Chem 32(4):243–247
Article MATH Google Scholar
Duwairi R, Abu-Rahmeh M (2015) A novel approach for initializing the spherical k-means clustering algorithm. Simul Modell Pract Theory 54:49–63
Article Google Scholar
Erisoglu M, Calis N, Sakallioglu S (2011) A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognit Lett 32(14):1701–1705
Article Google Scholar
Giancarlo R, Utro F (2011) Speeding up the consensus clustering methodology for microarray data analysis. Algorithms Mol Biol 6(1):1–13
Article Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30
Article Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2):107–145
Article MATH Google Scholar
Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesirov JP (2007) Subclass mapping: identifying common subtypes in independent disease data sets. PloS ONE 2(11):e1195
Article Google Scholar
Jain AK, Law MH (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell 3776:1–10
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Article Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Jothi R, Mohanty SK, Ojha A (2016a) Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput Biol Med 71:135–148
Article Google Scholar
Jothi R, Mohanty SK, Ojha A (2016b) On careful selection of initial centers for k-means algorithm. In: Proceedings of 3rd international conference on advanced computing, networking and informatics: ICACNI 2015, Vol 1, Springer India, New Delhi, pp 435–445
Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38(3):283–293
Article Google Scholar
Khan SS, Ahmad A (2004) Cluster center initialization algorithm for k-means clustering. Pattern Recognit Lett 25(11):1293–1302
Article Google Scholar
Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B: Cybern 29(3):433–439
Article Google Scholar
Lam YK, Tsang PW (2012) eXploratory k-means: a new simple and efficient algorithm for gene clustering. Appl Soft Comput 12(3):1149–1157
Article Google Scholar
Lam YK, Tsang PWM, Leung CS (2013) Pso-based k-means clustering with enhanced cluster matching for gene expression data. Neural Comput Appl 22(7–8):1349–1355
Article Google Scholar
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461
Article Google Scholar
Liu M, Jiang X, Kot AC (2009) A multi-prototype clustering algorithm. Pattern Recognit 42(5):689–698
Article MATH Google Scholar
Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004a) FGKA: A fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM symposium on Applied computing, ACM, pp 622–623
Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004b) Incremental genetic k-means algorithm and its application in gene expression data analysis. BMC Bioinform 5(1):172–181
Article Google Scholar
Martella F, Vichi M (2012) Clustering microarray data using model-based double k-means. J Appl Stat 39(9):1853–1869
Article MathSciNet Google Scholar
Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10(1):27–42
Article Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118
Article MATH Google Scholar
Nazeer K, Sebastian M, Kumar S (2013) A novel harmony search-k means hybrid algorithm for clustering gene expression data. Bioinformation 9(2):84–88
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Sun J, Chen W, Fang W, Wun X, Xu W (2012) Gene expression data analysis with the clustering method based on an improved quantum-behaved particle swarm optimization. Eng Appl Artif Intell 25(2):376–391
Article Google Scholar
Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19):2405–2412
Article Google Scholar
Ting S, Jennifer GD (2007) In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intell Data Anal 11(4):319–338
Article Google Scholar
Tzortzis G, Likas A (2014) The minmax k-means clustering algorithm. Pattern Recognit 47(7):2505–2516
Article Google Scholar
Validating Clustering for Gene Expression Data (2012) http://faculty.washington.edu/kayee/cluster/
Xu R, Wunsch DC (2010) Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng 3:120–154
Article Google Scholar
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, School of Technology, Pandit Deendayal Petroleum University, Gandhinagar, India
R. Jothi
Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Jabalpur, Madhya Pradesh, India
Sraban Kumar Mohanty & Aparajita Ojha

Authors

R. Jothi
View author publications
You can also search for this author in PubMed Google Scholar
Sraban Kumar Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Aparajita Ojha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Jothi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jothi, R., Mohanty, S.K. & Ojha, A. DK-means: a deterministic K-means clustering algorithm for gene expression analysis. Pattern Anal Applic 22, 649–667 (2019). https://doi.org/10.1007/s10044-017-0673-0

Download citation

Received: 03 March 2017
Accepted: 08 December 2017
Published: 28 December 2017
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s10044-017-0673-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DK-means: a deterministic K-means clustering algorithm for gene expression analysis

Abstract

Access this article

Similar content being viewed by others

RETRACTED ARTICLE: Research on semi supervised K-means clustering algorithm in data mining

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DK-means: a deterministic K-means clustering algorithm for gene expression analysis

Abstract

Access this article

Similar content being viewed by others

RETRACTED ARTICLE: Research on semi supervised K-means clustering algorithm in data mining

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation