Personal and Ubiquitous Computing

, Volume 22, Issue 3, pp 615–619 | Cite as

Effective cancer subtyping by employing density peaks clustering by using gene expression microarray

  • Rashid Mehmood
  • Saeed El-Ashram
  • Rongfang Bie
  • Yunchuan Sun
Original Article


Discovering the similar groups is a popular primary step in analysis of biomedical data, which cannot be identified manually. Many supervised and unsupervised machine learning and statistical approaches have been developed to solve this problem. Clustering is an unsupervised learning approach, which organizes the data into similar groups, and is used to discover the intrinsic hidden structure of data. In this paper, we used clustering by fast search and find of density peaks (CDP) approach for cancer subtyping and identification of normal tissues from tumor tissues. In additional, we also address the preprocessing and underlying distance matrix’s impact on finalized groups. We have performed extensive experiments on real-world and synthetic cancer gene expression microarray data sets and compared obtained results with state-of-the-art clustering approaches.


Gene expression microarray Data mining Clustering Density peaks 



This research is sponsored by the National Natural Science Foundation of China (No. 61571049, 61371185, 61401029, 61472044, and 61472403) and the Fundamental Research Funds for the Central Universities (No. 2014KJJCB32 and 2013NT57) and by SRF for ROCS, SEM.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no competing interests.


  1. 1.
    Ronan T, Qi Z, Naegle KM (2016) Avoiding common pitfalls when clustering biological data. Sci Signal 9:re6CrossRefGoogle Scholar
  2. 2.
    Zhuge H, Sun Y, (2010) The schema theory for semantic link network. Future Generation Computer Systems 26 (3):408-420Google Scholar
  3. 3.
    Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016 Oct 5) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217. CrossRefGoogle Scholar
  4. 4.
    Bie R, Mehmood R, Ruan S, Sun Y, Dawood H, (2016) Adaptive fuzzy clustering by fast search and find of density peaks. Personal and Ubiquitous Computing 20 (5):785-793Google Scholar
  5. 5.
    Cai Z, Goebel R, Salavatipour M, Lin G (2007) Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinformatics. 8:206.Google Scholar
  6. 6.
    Wiwie C, Baumbach J, Röttger R (2015) Comparing the performance of biomedical clustering methods. Nat Methods 12(11):1033–1038. CrossRefGoogle Scholar
  7. 7.
    Cai Z, Heydari M, Lin G (2006) Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 04 (05):935-957Google Scholar
  8. 8.
    Yang K, Cai Z, Li J, Lin G (2006) A Stable Gene Selection in Microarray Data Analysis. BMC Bioinformatics. 7:228. Google Scholar
  9. 9.
    Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1/2):91–118. CrossRefMATHGoogle Scholar
  10. 10.
    Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. CrossRefMATHGoogle Scholar
  11. 11.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14):281–297Google Scholar
  12. 12.
    Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6. MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. CrossRefGoogle Scholar
  14. 14.
    Krivtsov AV, Twomey D, Feng Z, Stubbs MC, Wang Y, Faber J, Levine JE, Wang J, Hahn WC, Gilliland DG, Golub TR, Armstrong SA (2006) Transformation from committed progenitor to leukaemia stem cell initiated by mll–af9. Nature 442(7104):818–822. CrossRefGoogle Scholar
  15. 15.
    Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG et al (2005) Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients. J Clin Oncol 23(6):1169–1177CrossRefGoogle Scholar
  16. 16.
    Jain A, Nandakumar K, Ross A (2005 Dec 31) Score normalization in multimodal biometric systems. Pattern Recogn 38(12):2270–2285. CrossRefGoogle Scholar
  17. 17.
    Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M, Prueitt RL (2006) A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103(7):2257–2261. CrossRefGoogle Scholar
  18. 18.
    Mehmood R, El-Ashram S, Bie R, Dawood H, Kos A (2017) Clustering by fast search and merge of local density peaks for gene expression microarray data. Scientific Reports 7:45602Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  • Rashid Mehmood
    • 1
  • Saeed El-Ashram
    • 2
  • Rongfang Bie
    • 1
  • Yunchuan Sun
    • 3
  1. 1.College of Information Science and TechnologyBeijing Normal UniversityBeijingChina
  2. 2.Faculty of ScienceKafr El-Sheikh UniversityKafr El-SheikhEgypt
  3. 3.Business SchoolBeijing Normal UniversityBeijingChina

Personalised recommendations