Epithelial-Mesenchymal Transition Regulatory Network-Based Feature Selection in Lung Cancer Prognosis Prediction

  • Borong Shao
  • Tim Conrad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9656)


Feature selection technique is often applied in identifying cancer prognosis biomarkers. However, many feature selection methods are prone to over-fitting or poor biological interpretation when applied on biological high-dimensional data. Network-based feature selection and data integration approaches are proposed to identify more robust biomarkers. We conducted experiments to investigate the advantages of the two approaches using epithelial mesenchymal transition regulatory network, which is demonstrated as highly relevant to cancer prognosis. We obtained data from The Cancer Genome Atlas. Prognosis prediction was made using Support Vector Machine. Under our experimental settings, the results showed that network-based features gave significantly more accurate predictions than individual molecular features, and features selected from integrated data (RNA-Seq and micro-RNA data) gave significantly more accurate predictions than features selected from single source data (RNA-Seq data). Our study indicated that biological network-based feature transformation and data integration are two useful approaches to identify robust cancer biomarkers.


Cancer prognosis prediction Epithelial mesenchymal transition Feature selection Data integration Network motif 



This study was funded by the German Ministry of Research and Education (BMBF) Project Grant 3FO18501 (Forschungscampus MODAL).


  1. 1.
    Ludwig, J.A., Weinstein, J.N.: Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. cancer 5(11), 845–856 (2005)CrossRefGoogle Scholar
  2. 2.
    Hanash, S.M., Pitteri, S.J., Faca, V.M.: Mining the plasma proteome for cancer biomarkers. Nature 452(7187), 571–579 (2008)CrossRefGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larraaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  5. 5.
    Thousands of Samples are Needed to Generate a Robust Gene List for Predicting Outcome in Cancer, vol. 103. National Academy Sciences (2006)Google Scholar
  6. 6.
    Haury, A.-C., Gestraud, P., Vert, J.-P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS One 6(12), e28210 (2011)CrossRefGoogle Scholar
  7. 7.
    Patel, V.N., Gokulrangan, G., Chowdhury, S.A., Chen, Y., Sloan, A.E., Koyutrk, M., Barnholtz-Sloan, J., Chance, M.R.: Network signatures of survival in glioblastoma multiforme. PLoS Comput. Biol. 9(9), e1003237 (2013)CrossRefGoogle Scholar
  8. 8.
    Dao, P., Colak, R., Salari, R., Moser, F., Davicioni, E., Schönhuth, A., Ester, M.: Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics 26(18), i625–i631 (2010)CrossRefGoogle Scholar
  9. 9.
    Clarke, R., Ressom, H.W., Zhang, Y., Xuan, J.: Module-based breast cancer classification. Int. J. Data Min. Bioinform. 7, 284–302 (2013)CrossRefGoogle Scholar
  10. 10.
    Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D., Ritchie, M.D.: Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16, 85–97 (2015)CrossRefGoogle Scholar
  11. 11.
    Kim, D., Shin, H., Song, Y.S., Kim, J.H.: Synergistic effect of different levels of genomic data for cancer clinical outcome prediction. J. Biomed. Inform. 45(6), 1191–1198 (2012)CrossRefGoogle Scholar
  12. 12.
    Huang, H.-L., Wu, Y.-C., Su, L.-J., Huang, Y.-J., Charoenkwan, P., Chen, W.-Li., Lee, H.-C., Chu, W.C.-C., Ho, S.-Y.: Discovery of prognostic biomarkers for predicting lung cancer metastasis using microarray and survival data. BMC Bioinform. 16(1) (2015)Google Scholar
  13. 13.
    Zhao, Q., Shi, X., Xie, Y., Huang, J., Shia, B.C., Ma, S.: Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Briefings Bioinform. 16(2), 291–303 (2015)CrossRefGoogle Scholar
  14. 14.
    Schliekelman, M.J., Taguchi, A., Zhu, J., Dai, X., Rodriguez, J., Celiktas, M., Zhang, Q., Chin, A., Wong, C.-H., Wang, H., et al.: Molecular portraits of epithelial, mesenchymal, and hybrid states in lung adenocarcinoma and their relevance to survival. Cancer Res. 75(9), 1789–1800 (2015)CrossRefGoogle Scholar
  15. 15.
    Chaffer, C.L., Weinberg, R.A.: A perspective on cancer cell metastasis. Science 331(6024), 1559–1564 (2011)CrossRefGoogle Scholar
  16. 16.
    Elsevier. EMT as the Ultimate Survival Mechanism of Cancer Cells, vol. 22 (2012)Google Scholar
  17. 17.
    Derynck, R., Lamouille, S., Xu, J.: Molecular mechanisms of epithelial-mesenchymal transition. Nat. Rev. Mol. Cell Biol. 15, 178–196 (2014)CrossRefGoogle Scholar
  18. 18.
    Kalluri, R., Weinberg, R.A.: The basics of epithelial-mesenchymal transition. J. Clin. Invest. 119(6), 1420–1428 (2009)CrossRefGoogle Scholar
  19. 19.
    Amin, E.M., Oltean, S., Hua, J., Gammons, M.V.R., Hamdollah-Zadeh, M., Welsh, G.I., Cheung, M.-K., Ni, L., Kase, S., Rennel, E.S., Symonds, K.E., Nowak, D.G., Royer-Pokora, B., Saleem, M.A., Hagiwara, M., Schumacher, V.A., Harper, S.J., Hinton, D.R., Bates, D.O., Ladomery, M.R.: WT1 mutants reveal SRPK1 to be a downstream angiogenesis target by altering VEGF splicing. Cancer Cell 20(6), 768–780 (2011)CrossRefGoogle Scholar
  20. 20.
    Berx, G., De Craene, B.: Regulatory networks defining EMT during cancer initiation and progression. Nat. Rev. Cancer 13(6), 97–110 (2013)Google Scholar
  21. 21.
    Ji, Y., Zhu, Y., Qiu, P.: TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nat. Methods 11, 599–600 (2014)CrossRefGoogle Scholar
  22. 22.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Wernicke, S., Rasche, F.: FANMOD: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)CrossRefGoogle Scholar
  24. 24.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar
  25. 25.
    World Scientific. Integrative Network Analysis to Identify Aberrant Pathway Networks in Ovarian Cancer (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceFreie Universität BerlinBerlinGermany
  2. 2.Zuse Institute BerlinBerlinGermany

Personalised recommendations