Effective cancer subtyping by employing density peaks clustering by using gene expression microarray
- 106 Downloads
Discovering the similar groups is a popular primary step in analysis of biomedical data, which cannot be identified manually. Many supervised and unsupervised machine learning and statistical approaches have been developed to solve this problem. Clustering is an unsupervised learning approach, which organizes the data into similar groups, and is used to discover the intrinsic hidden structure of data. In this paper, we used clustering by fast search and find of density peaks (CDP) approach for cancer subtyping and identification of normal tissues from tumor tissues. In additional, we also address the preprocessing and underlying distance matrix’s impact on finalized groups. We have performed extensive experiments on real-world and synthetic cancer gene expression microarray data sets and compared obtained results with state-of-the-art clustering approaches.
KeywordsGene expression microarray Data mining Clustering Density peaks
This research is sponsored by the National Natural Science Foundation of China (No. 61571049, 61371185, 61401029, 61472044, and 61472403) and the Fundamental Research Funds for the Central Universities (No. 2014KJJCB32 and 2013NT57) and by SRF for ROCS, SEM.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no competing interests.
- 2.Zhuge H, Sun Y, (2010) The schema theory for semantic link network. Future Generation Computer Systems 26 (3):408-420Google Scholar
- 4.Bie R, Mehmood R, Ruan S, Sun Y, Dawood H, (2016) Adaptive fuzzy clustering by fast search and find of density peaks. Personal and Ubiquitous Computing 20 (5):785-793Google Scholar
- 5.Cai Z, Goebel R, Salavatipour M, Lin G (2007) Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinformatics. 8:206.Google Scholar
- 7.Cai Z, Heydari M, Lin G (2006) Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 04 (05):935-957Google Scholar
- 8.Yang K, Cai Z, Li J, Lin G (2006) A Stable Gene Selection in Microarray Data Analysis. BMC Bioinformatics. 7:228. Google Scholar
- 11.MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14):281–297Google Scholar
- 14.Krivtsov AV, Twomey D, Feng Z, Stubbs MC, Wang Y, Faber J, Levine JE, Wang J, Hahn WC, Gilliland DG, Golub TR, Armstrong SA (2006) Transformation from committed progenitor to leukaemia stem cell initiated by mll–af9. Nature 442(7104):818–822. https://doi.org/10.1038/nature04980 CrossRefGoogle Scholar
- 17.Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M, Prueitt RL (2006) A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103(7):2257–2261. https://doi.org/10.1073/pnas.0510565103 CrossRefGoogle Scholar
- 18.Mehmood R, El-Ashram S, Bie R, Dawood H, Kos A (2017) Clustering by fast search and merge of local density peaks for gene expression microarray data. Scientific Reports 7:45602Google Scholar