A Betweenness Centrality Guided Clustering Algorithm and Its Applications to Cancer Diagnosis

  • R. Jothi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10682)


Clustering has become one of the important data analysis techniques for the discovery of cancer disease. Numerous clustering approaches have been proposed in the recent years. However, handling of high-dimensional cancer gene expression datasets remains an open challenge for clustering algorithms. In this paper, we present an improved graph based clustering algorithm by applying edge betweenness criterion on spanning subgraph. We carry out empirical analysis on artificial datasets and five cancer gene expression datasets. Results of the study show that the proposed algorithm can effectively discover the cancerous tissues and it performs better than two recent graph based clustering algorithms in terms of cluster quality as well as modularity index.


Clustering Cancer diagnosis Betweenness Spanning subgraph Minimum spanning tree 


  1. 1.
    Bayá, A.E., Granitto, P.M.: Clustering gene expression data with a penalized graph-based metric. BMC Bioinform. 12(1), 2–19 (2011)CrossRefGoogle Scholar
  2. 2.
    Bayá, A.E., Larese, M.G., Granitto, P.M.: Clustering using PK-D: a connectivity and density dissimilarity. Expert Syst. Appl. 51(1), 151–160 (2016)CrossRefGoogle Scholar
  3. 3.
    Dost, B., Wu, C., Su, A., Bafna, V.: TCLUST: a fast method for clustering genome-scale expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 8(3), 808–818 (2011)CrossRefGoogle Scholar
  4. 4.
    Hoshida, Y., Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11), e1195 (2007)CrossRefGoogle Scholar
  5. 5.
    Huttenhower, C., Flamholz, A.I., Landis, J.N., Sahi, S., Myers, C.L., Olszewski, K.L., Hibbs, M.A., Siemers, N.O., Troyanskaya, O.G., Coller, H.A.: Nearest Neighbor Networks: clustering expression data based on gene neighborhoods. BMC Bioinform. 8(250), 1–13 (2007)Google Scholar
  6. 6.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  7. 7.
    Jay, J.J., Eblen, J.D., Zhang, Y., Benson, M., Perkins, A.D., Saxton, A.M., Voy, B.H., Chesler, E.J., Langston, M.A.: A systematic comparison of genome-scale clustering algorithms. BMC Bioinform. 13(Suppl 10), S7 (2012)CrossRefGoogle Scholar
  8. 8.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  9. 9.
    Jothi, R., Mohanty, S.K., Ojha, A.: Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput. Biol. Med. 71, 135–148 (2016)CrossRefGoogle Scholar
  10. 10.
    Jothi, R., Mohanty, S.K., Ojha, A.: Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing 272, 542–557 (2017)CrossRefGoogle Scholar
  11. 11.
    Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Pirim, H., Ekşioğlu, B., Perkins, A.D.: Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic. Comput. Biol. Med. 62, 94–102 (2015)CrossRefGoogle Scholar
  13. 13.
    Ruan, J., Dean, A.K., Zhang, W.: A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Syst. Biol. 4(1), 8 (2010)CrossRefGoogle Scholar
  14. 14.
    de Souto, M.C., Costa, I.G., de Araujo, D.S., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 1–14 (2008)CrossRefGoogle Scholar
  15. 15.
    Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)CrossRefGoogle Scholar
  16. 16.
    Xu, R., Wunsch, D.C.: Clustering algorithms in biomedical research: a review. IEEE Rev. Biomed. Eng. 3, 120–154 (2010)CrossRefGoogle Scholar
  17. 17.
    Yu, Z., Wong, H.S., Wang, H.: Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21), 2888–2896 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer Engineering, School of TechnologyPandit Deendayal Petroleum UniversityGandhinagarIndia

Personalised recommendations